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ORAL  SESSION  01 
LEARNING  AND  GENERALIZATION 


8:30  01.1  BIRDSONG  LEARNING 

MARK  KONISHI,  Division  of  Biology,  California  Institute  of  Technology 
Invited  talk. 


9:10  01 .2  COMPARING  GENERALIZATION  BY  HUMANS  AND  ADAPTIVE  NETWORKS 

M.  PAVEL,  MARK  A.  GLUCK,  VAN  HENKLE,  Department  of  Psychology,  Stanford  University 

Generalization  of  a  pattern  categorization  task  was  investigated  in  a  simple,  deterministic,  inductive 
learning  task.  Each  of  eight  patterns  in  a  training  set  was  specified  in  terms  of  four  binary  features. 
After  subjects  learned  to  categorize  these  patterns  in  a  supervised  learning  paradigm,  they  were 
asked  to  generalize  their  knowledge  by  categorizing  novel  patterns.  We  analyzed  both  the  details  of 
the  learning  process  as  well  as  subjects’  generalizations  to  novel  patterns.  Certain  patterns  In  the 
training  set  were  consistently  found  to  be  more  difficult  to  learn  than  others.  The  subsequent 
generalizations  made  by  subjects  indicate  that  in  spite  of  important  individual  differences  subjects 
showed  systematic  similarities  in  how  they  generalized  to  novel  situations.  The  generalization 
performance  of  subjects  was  compared  to  those  that  could  possibly  be  generated  by  a  two-layer 
adaptive  network.  A  comparison  of  network  and  human  generalizations  indicates  that  using  a 
minimal  network  architecture  is  not  alone  a  sufficient  constraint  to  guarantee  that  a  network  will 
gen  jralize  the  way  humans  do. 


9:40  01.3  AN  OPTIMALITY  PRINCIPLE  FOR  UNSUPERVISED  LEARNING 

T.  SANGER,  Massachusetts  Institute  of  Technology  Al  Laboratory 

We  present  a  general  optimality  criterion  for  unsupervised  learning  which  can  be  used  to  design 
training  algorithms.  This  criterion  leads  to  the  "Principle  of  Maximum  Variance’  which  describes  a 
method  for  choosing  hidden  layers  in  a  multilayer  network.  We  prove  that  this  method  is  optimal,  and 
in  certain  cases  corresponds  to  the  Karhunen-Loeve  transform.  We  derive  a  new  learning  algorithm 
and  we  give  an  example  of  Its  use  for  a  computer  vision  system.  The  algorithm  finds  significant  local 
"features”  in  real  images  and  can  perform  texture  segmentation.  Our  results  apply  to  both  linear  and 
nonlinear  nets,  and  provide  a  rigorous  mathematical  basis  for  the  study  of  unsupervised  learning  in 
feedforward  neural  networks. 
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01 .4  LEARNING  BY  EXAMPLE  WITH  HINTS 

YASER  S.  ABU-MOST AFA,  Departments  of  Electrical  Engineering  and  Computer  Science,  California 
Institute  of  Technology 

Learning  by  example  is  the  process  of  mechanically  producing  an  (exact  or  approximate) 
implementation  of  a  function  based  merely  on  a  (large)  number  of  instances  of  the  function. 
Recently,  there  have  been  a  number  of  results  that  show  learning  by  example  to  be  NP-complete, 
hence  probably  intractable.  Experience  tells  us  that  the  learning  process  is  drastically  simplified  if  we 
take  advantage  of  the  'hints’  we  know  about  the  function,  instead  of  just  using  examples  blindly.  We 
address  the  formalization  of  what  hints  are  and  how  they  may  reduce  the  complexity  of  learning  by 
example. 


01 .5  ASSOCIATIVE  LEARNING  VIA  INHIBITORY  SEARCH 

DAVID  H.  ACKLEY,  Cognitive  Science  Research  Group,  Bell  Communications  Research 

ALVIS  is  a  reinforcement-based  connectionist  architecture  that  learns  associative  maps  in  continuous 
multidimensional  environments.  .  The  discovered  locations  of  positive  and  negative  reinforcements 
are  recorded  in  'do  be"  and  "don’t  be"  subnetworks,  respectively.  The  outputs  of  the  subnetworks 
relevant  to  the  current  goal  are  combined  and  compared  with  the  current  location  to  produce  an  error 
vector.  This  vector  is  backpropagated  through  a  motor-perceptual  mapping  network  to  produce  an 
action  vector  that  leads  the  system  towards  do-be  locations  and  away  from  don't-be  locations.  ALVIS 
is  demonstrated  with  a  simulated  robot  posed  a  target-seeking  task. 


01 .6  SPEEDY  ALTERNATIVES  TO  BACK  PROPAGATION 

JOHN  MOODY,  CHRIS  DARKEN,  Computer  Science  Department,  Yale  University 

We  propose  two  neuraliy-inspired  learning  algorithms  which  offer  much  greater  speed  than  Back 
Propagation.  These  algorithms  are  "Self-Organized  Learning  With  Receptive  Fields"  and  a  multi¬ 
resolution,  interpolating  variant  of  the  Cerebellar  Model  Articulation  Controller  (CMAC).  Both 
algorithms  share  three  critical  features  in  common:  they  have  only  one  layer  of  internal  units,  they 
utilize  a  self-organized  representation  of  the  input  space  on  the  internal  layer,  and  their  representation 
of  the  input  space  is  localized  or  only  slightly  distributed.  Furthermore,  the  CMAC  learning  rule 
requires  modification  of  only  the  output  weights.  These  features  result  in  increased  simulation  speed. 
In  detailed  comparisons  to  Back  Propagation  for  the  problem  of  predicting  Chaotic  Time  Series,  these 
new  algorithms  learn  as  much  as  one  thousand  times  faster  than  a  very  fast  implementation  of  Back 
Propagation  (conjugate  gradient)  while  achieving  comparable  prediction  capability  on  test  data.  Back 
Propagation,  however,  achieves  its  performance  with  a  smaller  set  of  training  data.  These  algorithms 
are  likely  to  provide  similar  speed  increases  in  other  problem  domains.  The  self-organizing  receptive 
field  model  is  in  principle  implementabie  as  an  analog  dynamical  system.  The  CMAC  can  be 
conveniently  implemented  purely  digitally  or  as  a  hybrid  digital/analog  system. 
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P1.1  EFFICIENT  PARALLEL  LEARNING  ALGORITHMS  FOR  NEURAL  NETWORKS 

ALAN  KRAMER,  ALBERTO  SANGIOVANNI-VINCENTELLI,  Department  ot  Electrical  Engineering 
and  Computer  Science,  University  of  California,  Berkeley 

We  are  interested  in  parallel  algorithms  for  quickly  finding  local  minima  in  the  weightspace  of 
feedforward  neural-net  learning  problems.  Backpropagation  Is  unsuitable  for  our  needs  because  of 
its  bad  convergence  properties.  We  have  implemented  a  partial  conjugate-gradient  algorithm  based 
on  the  Polak-Ribiere  rule  and  curve  fitting  techniques.  This  algorithm  has  good  convergence 
properties  and  in  practice  we  find  that  it  always  outperforms  backprop  in  terms  of  number  of  training 
set  sweeps  to  convergence.  Because  our  algorithm  has  small  storage  requirements  H  is  well  suited 
for  parallel  implementation  on  the  Connection  Machine. 

PI  .2  PROPERTIES  OF  A  HYBRID  NEURAL  NETWORK-CLASSIFIER  SYSTEM 

LAWRENCE  DAVIS,  Bolt  Beranek  and  Newman  Laboratories,  Cambridge,  MA 

A  machine  learning  system  is  described,  together  with  procedures  for  translating  some  representative 
neural  networks  and  classifier  systems  into  it.  Procedures  for  translating  some  representative 
learning  mechanisms  of  neural  networks  and  classifier  systems  into  the  hybrid  system  are  also  given. 
The  paper  shows  how  learning  procedures  such  as  genetic  operators  and  Hebbian  reinforcement 
thought  applicable  only  to  one  or  the  other  sort  of  learning  system  may  be  cross-applied  in  the  hybrid 
representation.  The  paper  concludes  with  a  discussion  of  interesting  consequences  of  these  results. 

PI  .3  SELF  ORGANIZING  NEURAL  NETWORKS  FOR  THE  IDENTIFICATION  PROBLEM 

M.F.  TENORIO,  WEI-TSIH  LEE,  School  of  Electrical  Engineering,  Purdue  University 

Identification  of  the  system  model  plays  an  important  role  in  various  engineering  fields,  including 
control,  computer  vision  and  speech,  and  adaptive  filtering.  In  this  paper,  we  address  the 
identification  of  nonparametric  system  models. 

The  Group  Method  Data  Handling  {GMDH)  algorithm  was  conceived  by  Ivakhnenko  in  1969.  This 
algorithm  embodies  the  idea  of  self  organizing  structures  based  on  the  Least  Mean  Square  Error. 
The  structure  is  generated  by  adding  a  perception-like  layer,  composed  of  two-input  neurons  which 
have  quadratic  transfer  functions.  The  GMDH  algorithm  using  a  heuristic  pruning  criteria  generates 
only  feedforward  suboptimal  structures. 

In  order  to  identify  optimal  structures  in  the  sense  of  complexity,  we  propose  a  new  and  more 
rigorous  self  organizing  algorithm.  From  Information  Theory,  the  algorithm  uses  the  Minimum 
Descriptive  Length  criteria  to  guide  a  stochastic  search  in  the  function  space,  based  on  a  modified 
simulated  annealing. 
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When  compared  with  previously  known  algorithms,  this  method  allows  tor  more  general  structures  to 
be  identified,  and  seems  to  choose  the  one  that  is  optimal.  The  results  of  a  complex  polynomial 
system  identification  will  be  shown  and  discussed. 

PI  .4  COMPARISON  OF  MULTILAYER  NETWORKS  AND  DATA  ANALYSIS 

P.  GALLINARI,  S.  THIRIA*,  F.  FOGELMAN  SOULIE,  Laboratoire  d’ Intelligence  Artificielle,  Universite 
de  Paris  ('also  at  Conservatoire  National  des  Arts  et  Metiers) 

This  paper  will  show  how  classical  tools  of  Data  Analysis  can  be  compared  with  multi  Layer  networks 
trained  with  the  Gradient  Back  Propagation  algorithm. 

We  will  also  present  methods  which  allow  to  use  these  tools  to  improve  performances  of  the  networks 
by  providing  adequate  pre-processing  of  the  data  and  indications  on  the  appropriate  number  of 
hidden  units. 

PI. 5  NEURAL  NETWORKS  AND  PRINCIPAL  COMPONENT  ANALYSIS:  LEARNING  FROM 

EXAMPLES  WITHOUT  LOCAL  MINIMA 

PIERRE  BALDI,  KURT  HORNIK",  Department  of  Mathematics,  University  of  California,  San  Diego 
("permanently  at  Technische  Universitat  Wien,  Vienna) 

We  consider  the  problem  of  learning  from  examples  in  layerd  linear-feed  forward  neural  networks 
using  optimization  methods,  such  as  back  propagation,  with  respect  to  ihe  usual  quadratic  error 
function  E  of  the  connection  weights.  Our  main  result  is  a  complete  description  of  the  landscape 
attached  to  E  in  terms  of  principal  component  analysis.  We  show  that  E  has  a  unique  minimum 
corresponding  to  the  projection  onto  the  subspace  generated  by  the  first  principal  vectors  of  a 
covariance  matrix  associated  with  the  training  patterns.  All  the  additional  critical  points  of  E  are 
saddle  points  (corresponding  to  projections  onto  subspaces  generated  by  higher  order  vectors).  The 
auto-associative  case  is  examined  in  detail.  Extensions  and  implications  for  the  learning  algorithms 
are  discussed. 

PI  .6  LEARNING  BY  CHOICE  OF  INTERNAL  REPRESENTATIONS 

TAL  GROSSMAN,  RONNY  MEIR'",  EYTAN  DOMANY,  Weizmann  Institute  of  Science  ('"currently 
at  Division  of  Chemistry,  California  Institute  of  Technology) 

We  introduce  a  learning  algorithm  for  three-layer  feedforward  neural  networks  (two-layer  perceptrons) 
composed  of  binary  linear  threshold  elements.  Whereas  existing  algorithms  reduce  the  learning 
process  to  minimizing  a  cost  function  over  the  weights,  our  method  treats  the  internal  representations 
as  the  fundamental  entities  to  be  determined.  Once  a  correct  set  of  interna,  representations  is  arrived 
at,  the  weights  are  found  by  the  local  and  biological fy  plausible  Perceptron  Learning  Rule  (PLR).  We 
tested  our  learning  algorithm  on  three  problems:  adjacency,  symmetry  and  parity. 

PI .7  WHAT  SIZE  NET  GIVES  VALID  GENERALIZATION? 

ERIC  B.  BAUM,  Jet  Propulsion  Laboratory,  California  Institute  of  Technology;  DAVID  HAUSSLER, 
Department  of  Computer  and  Information  Sciences,  University  of  California,  Santa  Cruz 

We  consider  loading  a  training  database  onto  a  fixed  neural  net  by  back  propagation  or  other  learning 
methods,  and  address  the  question  of  when  valid  generalization  can  be  expected.  We  assume  only 
that  we  are  trying  to  team  some  fixed,  unknown  concept  which  classifies  any  example  as  positive  or 
negative,  and  that  training  examples  and  future  test  examples  are  drawn  independently  at  random  (as 
in  Valiant's  learning  protocol)  from  some  fixed,  unknown,  arbitrary  probability  distribution.  Under 
these  assumptions  we  show  how  general  learning  results  derived  using  the  combinatorial  notion  of 
the  "Vapnik-Chervonenkis’  (VC)  dimension  (related  to  Cover's  notion  of  'capacity*)  can  be  applied 
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the  problem  of  generalization  in  neural  nets.  In  particular,  using  a  recent  calculation  of  the  VC 
dimension  of  feedforward  neural  nets,  we  give  tight  bounds  on  the  size  net  one  should  attempt  to  load 
a  given  training  database  of  m  examples  on,  if  one  wishes  valid  generalization  to  future  examples. 

PI  .8  MEAN  FIELD  ANNEALING  AND  NEURAL  NETWORKS 

B.  BILBRO,  T.K.  MILLER,  W.  SNYDER,  D.  VAN  DEN  BOUT,  M.  WHITE,  Department  of  Electrical 
and  Computer  Engineering,  North  Carolina  State  University,  Raleigh;  R.  MANN,  Engineering  Physics 
and  Mathematics  Division,  Oak  Ridge  National  Laboratory 

Nearly  optimal  solutions  to  many  combinatorial  problems  can  be  found  using  simulated  annealing 
(SA).  This  paper  uses  mean  field  theory  to  replace  the  discrete  degrees  of  freedom  manipulated  in 
simulated  annealing  with  their  continuous  averages.  The  convergence  of  this  mean  field  annealing 
(MFA)  technique  is  1-2  orders  of  magnitude  faster  than  that  of  simulated  annealing  yet  causes  no 
degradation  In  the  quality  of  the  final  solutions.  The  performance  of  MFA  is  demonstrated  upon 
several  example  problems;  graph  partitioning,  Boltzmann  neural  network  convergence,  and  image 
restoration.  A  linkage  is  established  between  MFA  and  Hopfield  neural  networks  which  has  important 
ramifications  in  the  analysis  and  control  of  such  networks. 

PI  .9  CONNECTIONIST  LEARNING  OF  EXPERT  PREFERENCES  BY  COMPARISON 
TRAINING 

GERALD  TESAURO,  Center  for  Complex  Systems  Research,  University  of  Illinois  at  Urbana- 
Champaign 

A  new  training  paradigm,  called  the  "comparison  paradigm,"  is  introduced  for  tasks  in  which  a 
network  must  learn  to  choose  a  preferred  pattern  from  a  set  of  n  alternatives,  based  on  examples  of 
human  expert  preferences.  In  this  paradigm,  the  input  to  the  network  consists  of  two  of  the  n 
alternatives,  and  the  trained  output  is  the  expert’s  judgement  of  which  pattern  is  better.  This 
paradigm  is  applied  to  the  learning  of  backgammon,  a  difficult  board  game  in  which  the  expert  selects 
a  move  from  a  set  of  legal  moves.  With  comparison  training,  much  higher  levels  of  performance  can 
be  achieved,  with  networks  that  are  much  smaller,  and  with  coding  schemes  that  are  much  simpler 
and  easier  to  understand.  Furthermore,  it  is  possible  to  set  up  the  network  so  that  it  always  produces 
consistent  rank-orderings. 

PI  .10  DYNAMIC  HYPOTHESIS  FORMATION  IN  CONNECTIONIST  NETWORKS 

MICHAEL  C.  MOZER,  Departments  of  Psychology  and  Computer  Science,  University  of  Toronto 

This  paper  proposes  a  way  of  using  the  knowledge  in  a  network  to  determine  the  functionality  of 
individual  units  and  connections.  The  basic  idea  is  a  bootstrapping  approach:  Take  the  network  in  its 
current  state  after  some  amount  of  training;  use  back  propagation  to  compute  the  influence  that 
particular  units  have  on  the  output  state;  suppress  the  least  salient  units  and  continue  training.  This 
technique  can  be  used  to  improve  learning  performance  when  the  input  contains  irrelevant  or 
redundant  information,  improve  generalization  by  eliminating  noise,  and  decrease  the  number  of 
weight  parameters  in  the  network. 

PI.  11  THE  BOLTZMANN  PERCEPTRON:  A  MULTI-LAYERED  FEED-FORWARD  NETWORK 

EQUIVALENT  TO  THE  BOLTZMANN  MACHINE 

EYAL  YAIR,  ALLEN  GERSHO,  Center  for  Information  Processing  Research,  Department  of  Electrical 
and  Computer  Engineering,  University  of  California,  Santa  Barbara 

A  deterministic,  feed-forward  networ,  called  the  Boltzmann  Perceptron,  is  introduced  which  has  some 
of  the  characteristics  of  a  multi-layer  perceptron  and  is  functionally  equivalent  to  the  Boltzmann 
machine  for  fuzzy  pattern  classification.  A  learning  algorithm  for  this  classifier  is  described.  The 
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classifier  performance  and  the  learning  algorithm  wifi  be  demonstrated  for  solving  a  detection  problem 
with  Gaussian  sources. 

PI. 12  ADAPTIVE  NEURAL-NET  PREPROCESSING  FOR  SIGNAL  DETECTION  IN  NON- 

GAUSSIAN  NOISE 

PAUL  E.  BECKMAN,  RICHARD  P.  LIPPMANN,  Massachusetts  Institute  of  Technology  Lincoln 
Laboratory 

A  nonlinearity  is  required  before  matched  filtering  in  minimum  error  receivers  when  additive  noise  is 
present  which  is  impulsive  and  highly  non-Gaussian.  Experiments  were  performed  to  determine 
whether  the  correct  clipping  nonlinearity  could  be  provided  by  a  single-input  single-output  multi-layer 
perceptron  trained  with  back  propagation.  It  was  found  that  multi-layer  perceptrons  with  different 
numbers  of  layers  and  hidden  nodes  could  be  trained  to  provide  the  types  of  nonlinearities  required 
with  fewer  than  5,000  presentations  of  noiseless  and  corrupted  waveform  samples.  A  trained 
network  used  as  a  front  end  for  a  linear  matched  filter  detector  greatly  reduced  the  probability  of  error. 
In  one  representative  condition  the  signal  detection  error  rate  dropped  from  26%  with  a  linear  front- 
end  to  4%  with  a  trained  net. 

PI. 13  TRAINING  MULTILAYER  PERCEPTRONS  WITH  THE  EXTENDED  KALMAN 

ALGORITHM 

SHARAD  SINGHAL,  LANCE  WU,  Bell  Communications  Research,  Morristown,  NJ 

Multilayer  perceptrons  are  usually  trained  using  the  back-propagation  algorithm  described  by 
Rumelhart  et  al.  In  this  algorithm  weight  updates  are  made  based  on  the  gradient  computed  from 
only  the  current  inputs  and  outputs;  gradient  information  from  previous  data  is  ignored.  Thus  many 
weight  changes  are  inconsistent  and  convergence  is  slow.  In  complex  problems,  thousands  of 
Iterations  may  be  required  for  convergence. 

In  this  paper,  we  apply  the  extended  Kalman  algorithm  to  multilayer  perceptrons.  Although  it  is 
computationally  complex,  this  algorithm  updates  weights  consistent  with  all  previously  seen  data  and 
usually  converges  in  a  few  iterations.  We  describe  the  algorithm  and  compare  it  with  back- 
propagation  using  several  examples. 

PI. 14  GEMINI:  GRADIENT  ESTIMATION  THROUGH  MATRIX  INVERSION  AFTER  NOISE 
INJECTION 

Y.  LECUN,  C.C.  GALLAND,  G.E.  HINTON,  Computer  Science  Department  and  Physics  Department, 
University  of  Toronto 

Back-Propagation  is  more  efficient,  but  less  neurally  plausible,  than  reinforcement  learning  that 
correlates  random  perturbations  with  changes  in  reinforcement.  GEMINI  Is  a  hybrid  procedure, 
readily  impiementabie  in  hardware,  which  provides  a  biologically  plausible  model  of  gradient-descent 
learning  in  multilayer  networks  while  approaching  the  efficiency  of  BP  type  procedures. 

GEMINI  injects  noise  only  at  the  first  hidden  layer  and  measures  the  resultant  effect  on  the  output 
error.  A  linear  network  associated  with  each  hidden  layer  iteratively  inverts  the  matrix  which  relates 
the  noise  to  the  error  change,  thereby  obtaining  the  error-derivatives.  No  back-propagation  is 
involved,  allowing  unknown  non-linearities  in  the  system. 
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PI  .1 5  ANALYSIS  OF  RECURRENT  BACKPROPAG ATION 

PATRICE  Y.  SIMARD,  MARY  B.  OTTAWAY,  DANA  H.  BALLARD,  Department  of  Computer  Science, 
University  of  Rochester 

This  paper  attempts  a  systematic  analysis  of  the  recurrent  backpropagation  (RBP)  algorithm.  We  first 
show  that  there  is  a  potential  problem  in  that  RBP  always  has  unstable  fixed  points.  We  show  by 
experiment  and  eigenvalue  analysis  that  this  is  not  the  case.  Next  we  examine  the  advantages  of 
RBP  over  the  standard  backpropagation  algorithm.  RBP  is  shown  to  build  stable  fixed  points 
corresponding  to  the  input  patterns.  This  makes  it  an  appropriate  tool  for  content  addressable 
memory.  Finally,  we  show  that  the  introduction  of  a  non-local  search  technique  such  as  simulated 
annealing  has  a  dramatic  effect  of  a  network's  ability  to  team  patterns. 

PI. 16  SCALING  AND  GENERALIZATION  IN  NEURAL  NETWORKS:  A  CASE  STUDY 

SUBUTAI  AHMAD,  GERALD  TESAURO,  Center  for  Complex  Systems  Research,  University  of 
Illinois  at  Urbana-Champaign 

The  issues  of  scaling  and  generalization  are  studied  in  the  context  of  the  majority  function.  We  find 
that  the  failure  rate,  the  fraction  of  misclassified  test  instances,  falls  off  exponentially  with  the  training 
set  size.  The  number  of  training  patterns  required  to  achieve  a  fixed  performance  level  increases 
linearly  with  the  number  of  input  units.  It  is  shown  that  a  boost  in  the  performance  level  can  be 
obtained  by  a  simple  change  in  the  input  representation.  It  is  also  shown  that  the  most  useful  training 
examples  are  the  ones  closest  to  the  separating  surface. 

PI. 17  DOES  THE  NEURON  "LEARN"  LIKE  THE  SYNAPSE? 

RAOUL  TAWEL,  Jet  Propulsion  Laboratory,  California  Institute  of  Technology 

We  describe  an  improved  learning  paradigm  that  promises  to  offer  a  significant  reduction  in 
computation  time  during  the  supervised  learning  phase.  It  is  based  on  extending  the  role  that  the 
neuron  plays  in  artificial  neural  systems.  Prior  work  has  regarded  the  neuron  as  a  strictly  passive 
non-linear  processing  element,  and  the  synapse  on  the  other  hand  as  the  primary  source  of 
information  processing.  In  this  work,  the  role  of  the  neuron  is  extended  and  provides  a  secondary 
source  of  information  processing.  This  is  achieved  by  treating  both  the  neuronal  and  synaptic 
parameters  on  an  equal  basis.  The  temperature  (i.e.  gain)  of  the  sigmoid  function  is  an  example  of 
such  a  parameter.  In  much  the  same  way  that  the  synaptic  interconnection  weights  Wj?  require 
optimization  to  reflect  the  knowledge  contained  within  the  training  set,  so  are  the  temperature  terms 
Tj?  The  indices  i  and  n  are  used  to  refer  to  the  temperature  of  the  Ith  neuron  on  the  n,h  layer  of  the 
network.  Clearly,  the  method  does  not  explicitly  optimize  a  global  temperature  for  the  network,  but 
allows  each  neuron  to  possess  and  update  its  own  characteristic  local  temperature.  This  algorithm 
has  been  applied  to  logic  type  of  problems,  such  as  the  XOR  or  parity  problem,  and  significantly 
decreases  the  learning  time  on  the  posed  problems.  For  example,  in  the  XOR  problem,  the  training 
time  was  decreased  by  over  three  orders  of  magnitude. 

PI. 18  EXPERIMENTS  ON  NETWORK  LEARNING  BY  EXHAUSTIVE  SEARCH 

D.B.  SCHWARTZ,  J.S.  DENKER,  S.A.  SOLLA,  AT&T  Bell  Laboratories,  Holmdel.  NJ 

We  have  performed  experiments  in  which  learning  is  explicitly  formulated  as  a  global  search  through 
the  set  of  possible  networks.  We  applied  this  to  a  problem  that  has  also  been  extensively  explored  by 
conventional  local  iterative  improvement  techniques  (e.g.,  back  propagation).  We  have  measured  the 
error  rate  and  the  final  entropy  (S,J  of  the  network-ensemble  after  training  with  m  examples,  and  find 
that  they  agree  qualitatively  with  predictions  of  our  simple  theory. 
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PI. 19  SOME  COMPARISONS  OF  CONSTRAINTS  FOR  MINIMAL  NETWORK  CONSTRUCTION 
WITH  BACKPROPAGATION 

STEPHEN  JOSE  HANSON,  Bell  Communications  Research,  Morristown,  N J;  LORIEN  Y.  PRATT, 
Rutgers  University 

Rumethart  has  proposed  a  method  for  choosing  minimal  representations  during  learning  in 
Backpropagation  networks.  This  approach  can  be  used  to  (a)  dynamically  select  the  number  of 
hidden  units,  (b)  construct  a  representation  that  is  appropriate  for  the  problem  and  (c)  thus  improve 
the  generalization  ability  of  Backpropagation  networks. 

The  method  Rumelhart  suggests  involves  adding  penalty  terms  to  the  usual  error  function.  These 
terms  will  in  effect  cause  some  weights  to  decay  sooner  than  others,  essentially  disconnecting  parts 
of  the  network  from  one  another.  Various  terms  which  are  included  in  the  objective  function  can  be 
seen  as  biasing  the  search  process  to  consider  only  representations  of  a  certain  type-those  that 
minimize  both  the  error  and  the  penalty  terms.  Consequently  the  nature  of  the  penalty  terms  are 
critical  for  choosing  one  representation  over  another  and  achieving  the  previously  stated  goals. 

In  this  paper  we  introduce  Rumelhart's  minimal  networks  idea  and  compare  several  possible 
constraints  on  the  weight  search  space.  These  constraints  are  compared  in  both  simple  counting 
problems  and  a  real  world  speech  recognition  problem.  In  general,  the  constrained  search  does 
seem  to  minimize  the  number  of  hidden  units  required  with  an  expected  increase  in  local  minima. 

PI. 20  IMPLEMENTING  THE  PRINCIPLE  OF  MAXIMUM  INFORMATION  PRESERVATION: 

LOCAL  ALGORITHMS  FOR  BIOLOGICAL  AND  SYNTHETIC  NETWORKS 

RALPH  LINSKER,  IBM  T.J.  Watson  Research  Center,  Yorktown  Heights,  NY 

The  principle  of  maximum  information  preservation  has  been  proposed  [R.  Unsker,  Computer  21  (3) 
105-117  (March  1988)]  as  a  possible  organizing  principle  for  multilayerd  perceptual  networks.  Each 
resulting  processing  stage  has  the  property  that  Its  output  values  enable  one  to  discriminate  optimally 
(in  an  information-theoretic  sense)  among  the  input  patterns  presented  to  it.  I  describe  local 
algorithms  for  implementing  this  principle  in  certain  types  of  nonlinear  networks  and  adaptive  filter 
banks.  The  resulting  feature  maps  are  discussed  for  input  ensembles  of  interest  for  biological  and 
synthetic  network  development.  New  results  for  feature  map  "magnification  factors"  are  also 
obtained,  and  are  consistent  with  biological  expectations. 

PI  .21  BIOLOGICAL  IMPLICATIONS  OF  A  PULSE-CODED  REFORMULATION  OF  KLOPFS 
DIFFERENTIAL-HEBBIAN  LEARNING  ALGORITHM 

MARK  A.  GLUCK,  DAVID  PARKER,  ERIC  REIFSNIDER,  Stanford  University 

We  present  a  pulse-coded  reformulation  of  Klopfs  (1987)  Dlfferential-Hebbian  learning  model.  The 
time  derivative  of  pulse  coded  information  can  be  calculated  without  using  any  unstable  differencing 
methods.  Thus,  learning  algorithms  such  as  Klopfs,  which  depend  on  computing  derivatives  of 
activations,  are  more  easily  and  stably  implemented  in  a  pulse-coded  system.  Furthermore,  through 
the  use  of  discrete  pulses  as  the  inputs  and  outputs  of  the  model,  instead  of  levels  of  activation,  the 
pulse-coded  Dlfferential-Hebbian  model  will  more  closely  simulate  the  physical  processes  occurring  in 
a  single  neuron.  This  allows  us  to  explore  possible  further  parallels  between  the  model  and  the 
biological  substrates  underlying  classical  conditioning  (see,  e.g.,  Gluck  &  Thompson,  1987;  Donegan, 
Gluck  &  Thompson,  in  press).  From  an  engineering  perspective,  it  also  suggests  possible  designs  for 
the  implementation  of  simple,  stable,  real-time  adaptive  signal-processing  systems. 
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PI  .22  COMPARISON  OF  TWO  LP  PARAMETRIC  REPRESENTATIONS  IN  A  NEURAL 
NETWORK-BASED  SPEECH  RECOGNIZER 

K.K.  PALIWAL,  Computer  Systems  and  Communications  Group,  Tata  Institute  of  Fundamental 
Research,  Bombay 

Although  the  different  linear  prediction  (LP)  parametric  representations  provide  equivalent  information 
about  the  short-time  spectral  envelope  of  speech,  these  representations  are  known  to  show 
differences  in  their  speech  recognition  performance  when  used  with  conventional  linear  pattern 
classifiers.  Recently,  an  error  back-propagation  algorithm  has  been  reported  in  the  literature  for 
training  the  artificial  neural  networks  and  it  has  been  shown  that  the  multi-layer  percept ron  (MLP) 
classifiers  which  are  nonlinear  in  nature  can  provide  arbitrarily  shaped  decision  surfaces  in  the 
multidimensional  pattern  space.  The  aim  of  the  present  paper  is  to  see  whether  the  different  LP 
parametric  representations  show  differences  in  their  speech  recognition  performance  for  these 
nonlinear  MLP  classifiers,  too.  For  this,  the  2-layer,  the  3-layer  and  the  4-layer  perception  classifiers 
are  studied  here  for  the  following  two  LP  parametric  representations:  1)  the  LP  coefficient 
representation  and  2)  the  cepstral  coefficient  representation.  The  results  for  the  conventional  linear 
pattern  classifiers  are  also  provided  here  for  the  sake  of  completeness.  It  is  shown  that  like  the 
conventional  pattern  classifiers  the  MLP  classifiers  also  result  in  better  recognition  performance  for 
the  cepstral  coefficient  representation  than  for  the  LP  coefficient  representation. 

PI  .23  NONLINEAR  DYNAMICAL  MODELING  OF  SPEECH  USING  NEURAL  NETWORKS 

NAFTALI TISHBY,  AT&T  Bell  Laboratories,  Murray  Hill,  NJ 

Natural  speech  is  shown  to  behave  as  an  output  of  a  low  dimensional  nonlinear  dynamical  system. 
The  correlation  dimension  of  the  attractor  of  the  speech  signal  is  measured  to  be  between  2-5  for 
voiced  speech  and  4-9  for  unvoiced  speech  sound.  By  training  a  multilayered  network  as  a  nonlinear 
predictor,  a  dynamical  system  was  created,  which  generated  speech  like  signals,  even  without  any 
excitation. 

PI  .24  USE  OF  MULTI-LAYERED  NETWORKS  FOR  CODING  SPEECH  WITH  PHONETIC 
FEATURES 

YOSHUA  BENGIO,  RANATO  DE  MORI,  School  of  Computer  Science,  McGill  University 

A  new  method  is  proposed  for  coding  speech  based  on  spectral  samples  and  properties  extracted 
using  operators  acting  as  windows  analysing  the  data  with  variable  time  and  frequency  resolution, 
executed  when  certain  preconditions  are  found  in  the  data  and  feeding  the  input  layer  of  several 
multi-layered  neural  networks.  The  enror  back-propagation  algorithm  was  used  to  train  the  neural 
networks.  In  am  experiment  with  the  E-set  (B,C,D,E,G,K,P,V,3)  an  overall  error  rate  of  9.5%  was 
obtained.  In  an  experiment  with  transitions  in  the  context  of  /a/,  /ae/,  /o/,  and  /u/  error  rates  of  3%, 
4%,  0%  and  0%  respectively  were  obtained. 
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PI  .25  SPEECH  PRODUCTION  USING  NEURAL  NETWORK  WITH  COOPERATIVE  LEARNING 
MECHANISM 

MITSUO  KOMURA,  AKIO  TANAKA,  International  Institute  for  Advanced  Study  of  Social  Information 
Science,  Fujitsu  Limited,  Shizuoka 

We  propose  a  new  neural  network  model  and  its  teaming  algorithm.  The  proposed  neural  network 
consists  of  four  layers--input,  hidden,  output  and  final  output  layers.  And  the  hidden  and  output  layers 
are  multiple.  The  proposed  algorithm  has  following  three  features.  (1)  The  singular  points  of  BP 
(Back  Propagation)  algorithm  are  removed.  (2)  Using  Spread  Pattern  Information  (SI)  learning 
algorithm  proposed  here,  the  network  leams  analog  data  accurately.  (3)  Using  Cooperative  Learning 
(CL)  algorithm  proposed  here,  it  is  possible  to  leam  analog  data  stably  and  to  obtain  smooth  outputs. 

We  have  developed  a  speech  production  system.  The  system  consists  of  a  phonemic  symbol 
production  subsystem  and  an  acoustic  parameter  production  subsystem  using  the  proposed  neural 
network.  We  have  succeeded  in  producing  natural  speech  waves  with  high  accuracy. 

PI  .26  TEMPORAL  REPRESENTATIONS  IN  A  CONNECTIONIST  SPEECH  SYSTEM 

ERICH  J.  SMYTHE,  Computer  Science  Department,  Indiana  University,  Bloomington 

SYREN  is  a  connectionist  model  that  uses  temporal  information  from  events  in  a  speech  signal  for 
syllable  recognition.  The  rates  and  directions  of  formant  center  transitions  are  identified,  and  an 
adaptive  method  associates  transition  events  with  each  syllable.  The  system  uses  explicit  temporal 
representations  by  converting  temporal  effects  into  spatial  representations.  SYREN  uses  implicit 
temporal  representations  in  formant  transition  identification  through  node  activation  onset,  decay,  and 
transmission  delays  in  sub-networks  analogous  to  visual  motion  detector  cells.  SYREN  recognizes 
79%  of  six  repetitions  of  24  consonant-vowel  syllables  when  tested  on  unseen  data,  and  recognizes 
100%  of  its  training  syllables. 

PI  .27  THEONET:  A  CONNECTIONIST  EXPERT  SYSTEM  THAT  ACTUALLY  WORKS 

RICHARD  FOZZARD,  LOUIS  CECI,  GARY  BRADSHAW,  Departments  of  Computer  Science  and 
Psychology.  University  of  Colorado  at  Boulder 

The  Space  Environment  Laboratory  in  Boulder  has  collaborated  with  the  University  of  Colorado  to 
construct  a  small  expert  system  for  solar  flare  forecasting,  called  THEO.  It  performed  as  well  as  a 
skilled  human  forecaster.  We  have  constructed  ThaoNet,  a  three-layer  back-propagation 
connectionist  network  that  teams  to  forecast  flares  as  well  as  THEO  does. 

A  study  of  the  internal  representations  constructed  by  the  network  may  give  insights  to  the 
'microstructure”  of  reasoning  processes  in  the  human  brain.  ThaoNet’s  success  suggests  that  a 
connectionist  network  can  perform  the  task  of  knowledge  engineering  automatically. 

PI  .28  AN  INFORMATION  THEORETIC  APPROACH  TO  RULE-BASED  CONNECTIONIST 
EXPERT  SYSTEMS 

RODNEY  M.  GOODMAN,  JOHN  W.  MILLER,  PADHRAIC  SMYTH,  California  Institute  of  Technology 

In  this  paper  we  present  a  new  method  for  implementing  fast  expert  systems  using  a  neural  network 
approach.  The  basis  of  our  model  is  a  new  information  theoretic  approach  to  rule  based  induction 
and  inferencing,  and  in  this  paper  we  show  how  such  a  model  can  be  implemented  on  a  connectionist 
architecture.  We  present  an  algorithm  for  automatically  learning  network  weights  which  correspond 
to  'rules*  in  our  model,  and  show  theoretically  and  via  simulations  how  probabilistic  inferendng  is 
being  performed  by  the  network  according  to  information  theoretic  principles. 
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PI  .29  NEURAL  TV  IMAGE  COMPRESSION  USING  HOPFIELD  TYPE  NETWORKS 

M.  NAILLON,  J.B.  THEETEN.  G.  NOCTURE,  Laboratoires  d’Electronique  at  d©  Physique  Applique© 
(Limeil  Brevannes  Cedex,  France) 

A  self-organizing  Hopfield  network  is  currently  being  developed  for  Vector  Quantization  oriented 
toward  television  image  compression,  to  optimize  the  codebook  in  case  of  a  non-connex  training  set 
of  vectors.  The  metastable  states  of  the  net  are  used  as  extra-codes  in  low  image  density  regions. 
The  self-organization,  controlled  by  a  distortion  measure,  is  realized  by  learning  adaptively,  among 
the  stable  and  the  metastabie  states,  the  most  relevant  attractors  for  the  coding  task.  The  Minimal 
Overlap  Learning  (W.  Krauth  and  M.  Mezard,  1987)  is  shown  to  be  optimal  for  making  tractable  the 
metastability. 

PI .30  PERFORMANCE  OF  SYNTHETIC  NEURAL  NETWORK  CLASSIFICATION  OF  NOISY 
RADAR  SIGNALS 

I.  JOUNY,  F.D.  GARBER,  ElectroScience  Laboratory,  Department  of  Electrical  Engineering,  Ohio 
State  University 

In  this  paper,  several  synthetic  network  models  are  used  to  classify  radar  signal  measurements  from 
five  commercial  aircraft.  The  performance  of  the  classifiers  is  evaluated,  by  means  of  computer 
simulation  studies,  under  a  number  of  conditions  concerning  the  noise  and  radar  receiver  models, 
and  azimuth  and  elevation  angle  ambiguity.  The  results  obtained  using  the  synthetic  neural  network 
classifiers  are  compared  with  those  obtained  using  an  (optimal)  maximum-likelihood  classifier  and  a 
(minimum-distance)  nearest-neighbor  classifier.  These  results  demonstrate  the  capability  of  synthetic 
network  models  to  be  trained  (under  noisy  conditions)  to  classify  noisy  measurements  of  radar 
signals.  It  is  also  shown,  through  the  results  of  classification  studies,  that  classification  systems 
based  on  synthetic  network  models  can  be  designed  to  realize  near-optimum  performance,  even  in 
situations  with  measurement  ambiguities  and  mismatch  of  noise  power-levels  for  training  and 
operation. 

PI  .31  THE  NEURAL  ANALOG  DIFFUSION-ENHANCEMENT  LAYER  (NADEL)  AND  EARLY 
VISUAL  PROCESSING 

ALLEN  M.  WAXMAN,  MICHAEL  SIEBERT,  Laboratory  for  Sensory  Robotics,  Boston  University 

We  introduce  a  new  class  of  neural  network  aimed  at  early  visual  processing;  we  call  it  a  Neural 
Analog  Diffusion-Enhancement  Layer  or  NADEL.  The  network  consists  of  two  levels  which  are 
coupled  through  nonlinear  feedback.  The  lower  level  is  a  two-dimensional  diffusion  map  which 
accepts  binary  visual  features  as  input  (e  g.,  edges  and  points)  and  spreads  activity  over  larger 
scales  as  a  function  of  time.  The  upper  layer  is  fed  the  activity  from  the  diffusion  layer  and  serves  to 
locate  local  maxima  in  it.  These  local  maxima  are  fed  back  to  the  diffusion  layer  using  an  on- 
center/off-surround  shunting  anatomy.  The  maxima  are  also  available  as  output  of  the  network.  The 
network  dynamics  serves  to  cluster  features  on  multiple  scales  and  can  be  used  to  support  a  large 
variety  of  early  visual  processing  tasks  such  as:  extraction  of  comers  and  high  curvature  points,  line 
end  detection,  filling  gaps  and  completing  contour  boundaries,  generating  saccadic  eye  motion 
sequences,  perceptual  grouping  on  multiple  scales,  correspondence  in  long-range  apparent  motion, 
and  building  2-D  shape  representations  that  are  Invariant  to  location,  orientation  and  scale  on  the 
visual  field.  The  NADEL  is  now  being  designed  for  implementation  in  Analog  VLSI. 

PI  .32  A  COOPERATIVE  NETWORK  FOR  COLOR  SEGMENTATION 

A.  HURLBERT,  T.  POGGIO,  Center  for  Biological  Information  Processing  and  the  Artificial 
Intelligence  Laboratory,  Massachusetts  Institute  of  Technology 

A  crucial  problem  in  the  computation  of  color  is  to  find  changes  in  surface  color  irrespective  of  the 
illuminant.  We  have  developed  a  cooperative  network,  similar  to  the  stereo  algorithm  of  Marr  and 
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Poggio,  that  takes  as  input  the  ratio  of  image  irradiances  in  two  distinct  spectral  channels.  The 
network  gives  satisfactC7  results  on  natural  images,  requiring  a  few  iterations  to  assign  constant 
colors  to  object  surfaces.  Discontinuities  in  color  are  localised  with  the  help  of  edges  in  the 
brightness  image  that  forms  one  of  the  inputs  to  the  cooperative  network.  The  algorithm  may  help  to 
explain  several  phenomena  associated  with  color  constancy  in  animals. 

PI  .33  NEURAL  NETWORK  STAR  PATTERN  RECOGNITION  FOR  SPACECRAFT  ATTITUDE 
DETERMINATION  AND  CONTROL 

PHILLIP  ALVELDA,  MIGUEL  A.  SAN  MARTIN,  CHARLES  E.  BELL,  JACOB  BARHEN,  Jet 
Propulsion  Laboratory,  California  Institute  of  Technology 

Some  of  the  most  complex  spacecraft  attitude  determination  and  control  tasks  are  ultimately 
governed  by  ground  based  systems  and  personnel.  Conventional  space-qualified  systems  face 
severe  computational  bottlenecks  introduced  by  serial  microprocessors  operating  on  inherently 
parallel  problems.  New  computer  architectures  based  on  the  anatomy  of  the  human  brain  seem  to 
promise  high  speed  and  fault  tolerant  solutions  to  some  of  the  inherent  limitations  of  standard 
microprocessors.  This  paper  will  discuss  the  latest  applications  of  artificial  Neural  Networks  to  the 
problem  of  star  pattern  recognition  for  spacecraft  attitude  determination. 

PI  .34  NEURAL  NETWORKS  THAT  LEARN  TO  DISCRIMINATE  SIMILAR  KANJI 
CHARACTERS 

YOSHIHIRO  MORI,  KAZUHIKO  YOKOSAWA,  ATR  Auditory  and  Visual  Perception  Research 
Laboratories,  Osaka 

A  neural  network  is  applied  to  the  problem  of  recognizing  Kanji  characters.  With  the  back 
propagation  network  learning  algorithm,  a  three- layered  feed-forward  network  is  trained  to  recognize 
similar  handwritten  Kanji  characters.  In  addition,  two  new  methods  are  utilized  to  make  training 
effective.  Recognition  rates  were  92.0%  for  testing  samples  and  99.5%  for  training  samples. 
Through  an  analysis  of  connection  weights,  it  became  clear  that  trained  networks  could  find  the 
hierarchical  structure  of  Kanji  characters.  This  strategy  of  trained  networks  makes  high  recognition 
accuracy  possible.  Our  results  suggest  that  neural  networks  are  very  effective  for  Kanji  character 
recognition. 

PI  .35  FURTHER  EXPLORATIONS  IN  THE  LEARNING  OF  VISUALLY-GUIDED  REACHING: 
MAKING  MURPHY  SMARTER 

BARTLETT  W.  MEL,  Center  for  Complex  Systems  Research,  University  of  Illinois  at  Urbana- 
Champaign 

MURPHY  is  an  unsupervised  robot/camera  learning  system  that  has  been  applied  to  the  problem  of 
grabbing  objects  In  the  presence  of  obstacles,  MURPHY’S  internal  representations  consist  of  several 
coarse-coded  populations  of  simple  units  encoding  both  static  and  dynamic  aspects  of  the  sensory- 
motor  environment.  By  moving  its  arm  around  in  its  visual  field,  MURPHY  learns  to  relate  motor 
commands  to  sensory  consequences  via  simple  one-layer  weight  modification  among  the  various  unit 
populations.  Initially  MURPHY  grabs  objects  via  heuristic  search;  recent  enhancements  allow 
MURPHY  to  minimize  search  and  improve  grabbing  performance  with  practice.  Under  current 
investigation  are  a  range  of  simple  heuristics  for  obstacle  avoidance  that  exploit  the  explicitly  visual 
nature  of  MURPHY’S  principal  internal  representation. 
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PI  .36  USING  BACKPROPAGATION  TO  LEARN  THE  DYNAMICS  OF  A  REAL  ROBOT  ARM 

KEN  GOLDBERG,  BARAK  PEARLMUTTER,  Department  of  Computer  Science,  Carnegie  Mellon 
University 

The  dynamics  of  a  robot  arm  specify  the  torques  necessary  to  achieve  a  desired  trajectory. 
Application  of  these  torques  allows  an  arm  to  be  controlled  more  accurately  than  with  conventional 
feedback  alone.  Computing  the  dynamics  is  thus  an  active  area  of  research  in  robotics.  In  this  paper 
we  apply  a  neural  network  to  the  dynamics  problem  and  measure  its  performance  on  the  CMU  Direct 
Drive  Arm  II. 

We  use  a  temporal  window  of  desired  positions  as  input  to  a  3-layer  backpropagation  network.  To 
test  generalization,  we  identify  a  family  of  "pick  and  place"  trajectories.  After  training  on  a  random 
sample  of  five  trajectories  from  this  family  (run  on  the  physical  arm)  the  network  generalizes  to 
members  of  the  same  family  with  root  mean  square  errors  of  less  than  4%.  We  find  that 
generalization  initially  improves  and  then  falls  as  window  size  is  increased.  We  analyze  the  weights 
developed  during  the  learning  phase  in  terms  of  the  velocity  and  acceleration  filters  used  in 
conventional  control  theory. 

Finally,  we  consider  the  network's  ability  to  generalize  to  larger  regions  of  the  state  space  and  report 
preliminary  simulation  results.  Trained  on  a  sample  of  300  points  chosen  randomly  from  state  space, 
a  small  backpropagation  network  can  learn  the  training  set  with  an  RMS  error  of  1%,  is  able  to 
generalize  to  the  test  trajectories  with  an  RMS  error  of  0.7%,  and  is  able  to  generalize  to  other  points 
chosen  randomly  from  phase  space  with  an  RMS  error  of  1 .2%. 

PI  .37  BACKPROPAGATION  AND  ITS  APPLICATION  TO  HANDWRITTEN  SIGNATURE 
VERIFICATION 

DOROTHY  MIGHELL,  JOSEPH  GOODMAN,  Stanford  University 

A  pool  of  handwritten  signatures  is  used  to  train  a  neural  network  for  the  task  of  deciding  whether  or 
not  a  given  signature  is  a  forgery.  The  network  is  a  feedforward  net,  with  a  binary  image  as  input. 
There  is  a  hidden  layer,  with  a  single  unit  output  layer.  The  weights  are  adjusted  according  to  the 
backpropagation  algorithm.  The  signatures  are  entered  into  a  C  software  program  through  the  use  of 
a  Datacopy  Electronic  Digitizing  Camera.  The  binary  signatures  are  normalized  and  centered.  The 
performance  is  examined  as  a  function  of  the  training  set  and  network  structure.  The  best  scores  are 
on  the  order  of  2%  true  signature  rejection  with  2-4%  false  signature  acceptance. 
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2:20 


3:00 


3:30 


4:00 


ORAL  SESSION  02 
APPLICATIONS 


02.1  SPEECH  RECOGNITION 

JOHN  BRIDLE,  Royal  Radar  Establishment,  Malvern,  U.K. 

Invited  talk. 


02.2  APPLICATIONS  OF  ERROR  BACK-PROPAGATION  TO  PHONETIC  CLASSIFICATION 

HONG  C.  LEUNG.  VICTOR  W.  ZUE,  Department  of  Electrical  Engineering  and  Computer  Science, 
Massachusetts  Institute  of  Technology 

This  paper  is  concerned  with  the  application  of  error  back-propagation  (BP)  to  pattern  classification. 
Our  investigation  based  on  classification  of  the  16  spoken  American  vowels  reveals  that  BP  can 
integrate  heterogenous,  numerical,  and  symbolic  sources  of  information.  Depending  on  the  amount 
of  information  provided,  the  network  achieves  performance  ranging  from  60%  to  72%,  which  is 
comparable  to  the  average  agreement  among  human  listeners.  By  using  back  propagation  with  a 
weighted  mean  square  error,  the  rank-order  statistics  of  the  network  can  be  improved.  Training 
characteristics,  self-organization  into  natural  articulatory  classes,  rapid  speaker  adaptation,  and  direct 
comparisons  with  other  classification  techniques  are  also  discussed. 


02.3  MODULARITY  IN  NEURAL  NETWORKS  FOR  SPEECH  RECOGNITION 
A.  WAIBEL,  Department  of  Computer  Science,  Carnegie  Mellon  University 

In  this  paper  we  show  that  neural  networks  for  speech  recognition  can  be  constructed  in  a  modular 
fashion  by  exploiting  the  hidden  structure  of  previously  trained  phonetic  subcategory  networks.  The 
performance  of  resulting  larger  phonetic  nets  was  found  to  be  as  good  as  the  performance  of  the 
subcomponent  nets  by  themselves.  This  approach  avoids  the  excessive  learning  times  that  would  be 
necessary  to  train  larger  networks  and  allows  for  incremental  learning  (generally  not  easily  possible  in 
networks  that  inhibit  incorrect  output  categories). 


BREAK 
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4:20  02.4  NEURAL  NETWORK  RECOGNIZER  FOR  HAND-WRITTEN  ZIP  CODE  DIGITS: 

REPRESENTATIONS,  ALGORITHMS,  AND  HARDWARE 

J.S.  DENKER,  H.P.  GRAF,  L.D.  JACKEL,  R.E.  HOWARD,  W.  HUBBARD,  D.  HENDERSON,  W.R. 
GARDNER,  AT&T  Bell  Laboratories,  Holmdel,  NJ;  H.S.  BAIRD,  AT&T  Bell  Laboratories,  Murray  Hill, 
NJ;  I.  GUYON,  ESPCI,  Paris 

A  neural-network  digit  recognition  system  has  been  developed  and  used  to  classify  handwritten 
zipcodes  taken  from  actual  envelopes.  The  system  includes  a  VLSI  chip  for  fast  feature  extraction. 
Numerous  classifiers  (neural  and  classical)  were  tested  and  compared.  This  process  has  exposed 
basic  issues  relevant  to  other  categorization  tasks. 


4:50  02.5  ALVINN:  AN  AUTONOMOUS  LAND  VEHICLE  IN  A  NEURAL  NETWORK 

DEAN  A.  POMERLEAU,  Computer  Science  Department,  Carnegie  Mellon  University 

ALVINN  (Autonomous  Land  Vehicle  In  a  Neural  Network)  is  a  back-propagation  network  designed  for 
the  navigational  task  of  road  following.  Currently  ALVINN  is  designed  to  process  images  from  a  video 
camera  and  a  laser  rangefinder,  producing  as  output  an  estimate  of  the  orientation  of  the  road  in  the 
image  and  the  direction  the  vehicle  should  travel  to  head  towards  the  road  center. 

Training  has  been  conducted  using  synthetic  road  images.  Tests  using  novel  synthetic  roads  and  a 
limited  number  of  real  road  sequences  indicate  the  network  should  accurately  follow  actual  roads. 
Currently  we  are  implementing  ALVINN  on  the  NAVLAB  vehicle  at  CMU  to  determine  the 
performance  of  the  network  under  field  conditions.  In  order  to  increase  road  following  accuracy  we 
are  also  exploring  various  network  architectures,  including  networks  with  feedback  from  one  image  to 
the  next. 


5:20  02.6  NEURAL  NET  RECEIVERS  IN  SPREAD-SPECTRUM  MULTIPLE-ACCESS 

COMMUNICATION  SYSTEMS 

BERND-PETER  PARIS,  GEOFFREY  ORSAK,  MAHESH  K.  VARANASI,  BEHNAAM  AAZHANG, 
Department  of  Electrical  and  Computer  Engineering,  Rice  University 

The  application  of  neural  networks  to  the  demodulation  of  direct  sequence  spread-spectrum  signals  in 
a  multiple-access  environment  is  considered.  The  use  of  neural  net  receivers  in  this  environment  is 
motivated  by  the  fact  that  the  optimum  receiver  is  too  complex  to  be  of  practical  use.  The  optimum 
receiver  is  used  to  benchmark  the  performance  of  the  neural  net  receiver;  in  particular,  it  is  proven  to 
be  instrumental  in  identifying  the  way  decisions  are  made  by  the  neural  network.  It  is  shown  that  the 
convergence  of  the  back-propagation  algorithm  can  be  accelerated  substantially  by  proper  selection 
of  the  initial  weights.  Furthermore,  the  method  of  Importance  Sampling  is  introduced  to  reduce  the 
number  of  simulations  necessary  to  evaluate  the  performance  of  neural  nets.  In  ail  examples 
considered  the  proposed  neural  net  receiver  significantly  outperforms  the  conventional  matched  filter 
receiver. 


6:00  BEER  AND  CHIPS 


8:00  POSTER  SESSION  1 
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8:30 

9:10 


9:40 


10:10 

10:30 


ORAL  SESSION  03 
NEUROBIOLOGY 


03.1  CRICKET  WIND  DETECTION 

JOHN  MILLER,  Department  of  Zoology,  University  of  California,  Berkeley 
Invited  talk. 


03.2  A  PASSIVE,  SHARED  ELEMENT  ANALOG  ELECTRONIC  COCHLEA 

D.  FELD,  J.  EISENBERG,  E.R.  LEWIS,  Department  of  Electrical  Engineering  and  Computer  Science, 
University  of  California,  Berkeley 

We  have  simulated  an  electrical  cochlea,  which  models  the  micromechanics  of  the  human  ear.  in  this 
respect  it  differs  from  other  recent  cochlear  models  that  have  been  proposed.  In  our  model  we 
observe  extraordinarily  sharp  high  frequency  rolloffs  and  linear  phase,  characteristics  measured  In 
the  mammalian  cochlea.  We  also  observe  comer  frequencies  spanning  nearly  seven  octaves  in  the 
normal  range  of  human  audition.  By  basing  our  model  on  physiological  structure,  we  can  obtain  a 
better  understanding  of  the  underlying  mechanisms  in  the  auditory  system,  leading  to  a  more 
complete  characterization  of  the  parallel  processing  performed  by  front-end  neural  networks. 


03.3  NEURONAL  MAPS  FOR  SENSORY-MOTOR  CONTROL  IN  THE  BARN  OWL 

C.D.  SPENCE,  J.C.  PEARSON,  J.J.  GELFAND,  R.M.  PETERSON,  David  Samoff  Research  Center, 
Princeton,  NJ;  W.E.  SULLIVAN,  Department  of  Biology,  Princeton  University 

The  Bam  Owl  has  fused  visual/auditory /motor  representations  of  space  in  its  midbrain  which  are  used 
to  orient  the  head  so  that  visual  or  auditory  stimuli  are  centered  in  the  visual  field  of  view.  We  present 
models  and  computer  simulations  of  these  structures  which  address  various  problems,  including  the 
construction  of  a  map  of  space  from  auditory  sensory  information,  the  adaptive  fusion  of  information 
from  different  senses,  and  the  problem  of  driving  the  motor  system  from  these  maps.  We  compare 
the  results  with  biological  data. 


BREAK 


03.4  SIMULATING  CAT  VISUAL  CORTEX:  CIRCUITRY  UNDERLYING  ORIENTATION 
SELECTIVITY 

U.J.  WEHMEIER,  D.C.  VAN  ESSEN,  C.  KOCH,  Division  of  Biology  ,  California  Institute  of  Technology 

Many  models  have  been  proposed  to  account  for  critical  orientation  and  direction  tuning  in  the  visual 
system  of  mammals.  We  investigate  a  number  of  these  strategies,  in  particular  those  invoking 


WEDNESDAY  MORNING 


intracortical  inhibition,  to  test  their  agreement  with  known  cortical  anatomy  and  physiology.  Our 
computer  model  of  the  early  visual  system  in  cat  simulates  the  dynamics  of  neurons  within  a  small  (2° 
by  2°)  patch  of  visual  angle  in  the  retina,  its  projection  to  IGN  and  its  subsequent  projection  to  layer 
IVc  in  cortical  area  17.  Extensions  to  the  simulator  also  permit  detailed  modelling  of  individual  cells  in 
cortical  layer  IVc,  as  well  as  generation  of  receptive  field  contours  of  cortical  cells. 


11:00  03.5  MODEL  OF  OCULAR  DOMINANCE  COLUMN  FORMATION:  ANALYTICAL  AND 

COMPUTATIONAL  RESULTS 

K.D.  MILLER,  J.B.  KELLER,  M.P.  STRYKER,  Department  of  Physiology,  University  of  California,  San 
Francisco  and  Departments  of  Neuroscience  and  Mathematics,  Stanford  University 

We  have  previously  developed  a  simple  mathematical  model  for  formation  of  ocular  dominance 
columns  in  mammalian  visual  cortex  (Soc.  Neur.  Abs.  12:1373  (1986)).  The  model  provides  a 
common  framework  in  which  a  variety  of  activity-dependent  biological  models  can  be  studied. 

Analytic  and  computational  results  together  now  reveal  the  following:  if  afferents  within  each  eye  are 
locally  correlated  in  their  firing,  and  are  not  anticorrelated  within  an  arbor  radius,  monocular  cells  will 
robustly  form  and  be  organized  by  intra-cortical  interactions  into  columns.  Broader  correlations  within 
each  eye,  or  anti-correlations  between  the  eyes,  create  a  more  purely  monocular  cortex;  positive 
correlation  over  an  arbor  radius  yields  an  almost  perfectly  monocular  cortex.  The  width  of  the 
columns,  as  determined  by  computing  the  power  spectra  of  the  columnar  patterns,  can  be  accurately 
predicted  from  the  biological  functions  input  to  the  model.  The  effects  of  monocular  deprivation, 
modelled  by  reducing  the  activity  within  one  eye,  are  accurately  reproduced,  and  a  critical  period  is 
seen.  Most  features  of  the  model  can  be  analytically  understood  through  decomposition  into 
eigenfunctions  and  linear  stability  analysis,  allowing  predictions  of  the  results  expected  under  a  given 
plasticity  model  from  measured  biological  parameters. 


1 1 :30  03.6  MODELING  A  CENTRAL  PATTERN  GENERATOR  IN  SOFTWARE  AND  HARDWARE: 

TRITONIA  IN  SEA  MOSS 

S.  RYCKEBUSCH,  C.  MEAD,  J.M.  BOWER,  Computational  Neural  Systems  Program,  California 
Institute  of  Technology 

We  will  present  a  model  implemented  in  software  and  hardware  (CMOS)  of  the  central  pattern 
generator  that  controls  the  swimming  behavior  of  the  marine  molusc  Tritonia  This  CPG  is  capable  of 
generating  different  patterns  of  neuronal  oscillations  and  resulting  animal  movements  depending  on 
the  strength  of  the  input  that  the  animal  receives  from  the  sensory  periphery.  The  CMOS 
implementations  of  a  model  of  this  system  are  capable  of  replicating  this  behavior.  The  CMOS  circuit 
we  have  built  is  based  on  an  analog  equivalent  neuron  that  has  bursting  properties  similar  to  those 
found  in  the  real  neural  network.  We  will  describe  and  discuss  the  roles  played  by  neuronal 
connections  with  different  time  constants  in  the  activity  of  this  neuron. 
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POSTER  SESSION  P2A 
NEUROBIOLOGY 


12:00  POSTER  PREVIEW  2 

P2.1  STORAGE  OF  COVARIANCE  BY  THE  SELECTIVE  LONG-TERM  POTENTIATION  AND 
DEPRESSION  OF  SYNAPTIC  STRENGTHS  IN  THE  HIPPOCAMPUS 

P.K.  STANTON,  J.  JESTER,  S.  CHATTARJI,  T.J.  SEJNOWSKI,  Department  of  Biophysics,  Johns 
Hopkins  University 

v 

Many  network  learning  algorithms  require  mechanisms  permitting  both  the  long-term  reduction  as 
well  as  the  enhancement  of  synapse  strength.  We  are  exploring  the  physiological  conditions  for  the 
induction  of  long-term  potentiation  (LTP)  and  depression  (LTD)  of  synaptic  strengths  In  the 
hippocampus.  The  rhythmic  bursting  of  a  strong  input  is  effective  in  producing  LTP.  We  have  found 
that  a  weak  input  which  by  itself  does  not  cause  any  persistent  change  in  synaptic  strength  can  either 
increase  in  strength  (associative  LTP)  or  decrease  in  strength  (associative  LTD)  depending  on  its 
phase  of  arrival  within  the  rhythm  of  a  strong  input  that  produces  LTP.  Thus,  information  contained  in 
the  covariance  of  the  weak  and  strong  input  can  be  stored  (Sejnowski,  T.J.,  J.  Theo.  Biol.  4:  203-211, 
1976). 

P2.2  A  MATHEMATICAL  MODEL  OF  THE  OLFACTORY  BULB 

ZHAOPING  LI,  J.J.  HOPFIELD,  Division  of  Biology,  California  institute  of  Technology 

The  olfactory  bulb  of  mammals,  the  first  processing  center  after  the  sensory  cells  in  the  olfactory 
pathway,  is  believed  to  aid  in  the  discrimination  of  odors.  A  mathematical  model  based  on  the  bulbar 
anatomy  and  electrophysiology  has  been  constructed.  Simulations  produce  a  35-60  Hz  modulated 
activity  which  is  coherent  across  the  bulb,  and  mimics  the  observed  field  potentials.  A  linear  analysis 
reveals  the  mechanism  of  oscillations  and  their  patterns'  dependence  on  odor  inputs.  Analysis  and 
simulation  show  that  the  bulb,  with  appropriate  inputs  to  its  inhibitory  cells  from  higher  centers,  can 
enhance  or  suppress  the  sensitivity  to  particular  odors. 

P2.3  A  MODEL  OF  NEURAL  CONTROL  OF  THE  VESTIBULO-OCULAR  REFLEX 

M  G.  PAULIN,  S.  LUDTKE,  M.  NELSON,  J.M  BOWER,  Division  of  Biology,  California  Institute  of 
Technology 

During  head  movements  there  are  compensatory  eye  movements  which  stabilize  the  eyes  and 
improve  visual  acuity  by  reducing  image  movement  across  the  retina.  In  the  upper  part  of  the 
bandwidth  of  natural  head  movements  (2- 10Hz  in  humans)  eye  stabilization  is  mainly  due  to  the 
vestibulo-ocular  reflex  (VOR).  Measurements  of  VOR  dynamics  indicate  that  the  VOR  is  an  optimal 
head  velocity  estimator  which  minimizes  retinal  image  slip  during  head  movements.  We  have 
constructed  a  neural  network  model  based  on  the  actual  neural  circuit  topology  underlying  the  VOR. 
The  model  generates  the  required  dynamics  for  optimal  VOR  control  in  a  stable  manner  using 
feedback  loops  between  pools  of  neurons  and  transversal  (delay  line)  filtering.  We  are  extending  the 
model  to  examine  possible  functional  roles  of  cerebellar  cortex  in  VOR  control. 
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P2.4  ASSOCIATIVE  LEARNING  IN  HERMISSENDA:  A  LUMPED  PARAMETER  COMPUTER 
MODEL  OF  NEUROPHYSIOLOGICAL  PROCESSES 

DANIEL  L.  ALKON,  Laboratory  of  Molecular  and  Cellular  Neurobiology,  NINCDS,  National  Institutes 
of  Health;  FRANCIS  QUEK,  Environmental  Research  Institute  of  Michigan,  Ann  Arbor,  Ml;  THOMAS 
P.  VOGL.  Environmental  Research  Institute  of  Michigan,  Arlington,  VA 

Electrophysiological,  biophysical,  and  biochemical  measurements  in  individual  neurons  of  the  visual- 
vestibular  pathways  of  numerous  naive,  conditioned,  and  sham-conditioned  specimens  of  the  marine 
mollusk,  Hermissenda  crassicornis,  have  demonstrated  reproducible  changes  that  are  unique  to 
associative  learning  as  exemplified  by  classical  conditioning.  In  order  to  provide  corroborative 
evidence  that  these  effects  are  necessary  and  sufficient  to  account  for  the  observed  learning, 
storage,  and  recall,  a  detailed  lumped  parameter  computer  model  of  the  relevant  neurons  and  their 
interconnections  has  been  constructed.  The  model  consists  solely  of  neuronal  features  that  can  be 
justified  explicitly  on  neurological  and  biophysical  grounds,  and  the  neurons  are  interconnected  by 
previously  mapped  pathways.  The  computer  model  correctly  reproduces  the  electrophysiological 
signals  observed  in  nature  before,  during,  and  after  inputs  that  mimic  temporally  associated  and 
random  light  and  rotation  inputs.  Particularly  noteworthy  is  the  necessity  for  incorporating  into  the 
model  a  number  of  features  that  fall  into  three  broad  categories;  random  neuronal  firing,  second 
order  control  mechanisms,  and  history  (time)  dependent  neuronal  responses.  The  potential 
contributions  of  such  detailed  computer  models  to  both  neurobiology  and  computer  science  are 
explored. 

P2.5  RECONSTRUCTION  OF  THE  ELECTRIC  FIELDS  OF  THE  WEAKLY  ELECTRIC  FISH 
GNATHONEMUS  PETERSH  GENERATED  DURING  EXPLORATORY  ACTIVITY 

B.  RASNOW,  Department  of  Physics,  M.E.  NELSON,  J.M.  BOWER,  Department  of  Biology, 

C.  ASSAD,  Department  of  Electrical  Engineering,  California  Institute  of  Technology 

Active  probing  and  exploration  of  the  environment  is  characteristic  of  the  behavior  of  most  higher 
animals.  In  this  paper  we  will  present  results  relevant  to  motor  control  of  movements  which  affect  the 
positioning  of  sensory  structures  during  active  exploration.  The  weakly  electric  fish,  Gnathonomus 
petersii,  possesses  an  intriguing  sensory-motor  system  in  which  electric  fields  are  generated  by  an 
electric  organ  in  the  tail  and  small  perturbations  in  the  fields  resulting  from  nearby  objects  In  the 
surrounding  environment  are  detected  by  electroreceptors  distributed  along  the  body  surface.  We  will 
describe  and  present  data  from  techniques  which  we  have  developed  for  recording  body  position  and 
measuring  electric  fields  during  exploratory  activity  of  these  fish.  This  work  represents  a  first  step  in 
quantifying  the  exploratory  behavior  of  the  fish  and  investigating  possible  computational  strategies 
they  use  during  active  exploration  of  their  environments. 

P2.6  A  MODEL  FOR  RESOLUTION  ENHANCEMENT  (HYPERACUITY)  IN  SENSORY 
REPRESENTATION 

JUN  ZHANG,  Neurobiology  Group,  JOHN  MILLER,  Department  of  Zoology,  University  of  California, 
Berkeley 

Heiligenberg  recently  proposed  a  model  to  explain  how  sensory  maps  could  enhance  resolution 
through  orderly  arrangement  of  broadly  tuned  receptors.  We  have  extended  this  model  to  a 
generalized  case  with  arbitrary  weighting  schemes.  We  prove  that  for  any  polynomial  weighting 
function  w(k),  the  response  f(x)  is  a  polynomial  function  up  to  the  same  order.  In  particular,  if  w(k)  is 
a  Hermitian  polynomial,  the  resulting  f(x)  will  be  the  identical  Hermitian  function.  For  other  spatially 
bounded  weighting  schemes,  we  prove  the  general  result  that  f(x)  is  proportional  to  w(x),  under  some 
restrictions.  We  also  addressed  the  problem  of  "edge-effect*  introduced  at  the  boundary  of  the 
receptor  array.  Finally,  we  investigated  a  real  biological  system  (the  cricket  cereal  sensory  system) 
as  an  application  of  this  model. 
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P2.7  CODING  SCHEMES  FOR  MOTION  COMPUTATION  IN  MAMMALIAN  CORTEX 

H.  TAICHI  WANG,  BIMAL  P.  MATHUR,  Science  Center,  Rockwell  International,  Thousand  Oaks,  CA; 
CHRISTOF  KOCH,  Division  of  Biology,  California  Institute  o<  Technology 

The  representation  and  coding  scheme  chosen  for  a  particular  algorithm  is  crucial.  In  this  paper,  we 
report  the  implications  of  different  coding  schemes  of  the  direction-selective  representation  for  motion 
computation  in  the  mammalian  cortex,  in  terms  of  the  performance  and  implementation  of  the 
resulting  neural  network  models.  Two  coding  schemes,  the  winner-take-all  (WTA)  coding  and  the 
population  coding,  are  compared  in  detail.  Our  result  show  that  the  population  coding  scheme  is 
likely  to  be  the  one  used  within  cortex,  While  the  shunting  inhibition  implementation  of  the  WTA 
coding  scheme  is  more  attractive  from  the  electronics  implementation  point  of  view. 

P2.0  THEORY  OF  SELF-ORGANIZATION  OF  CORTICAL  MAPS 

SHIGERU  TANAKA,  Fundamental  Research  Laboratory  of  NEC  Corporation,  Kawasaki  Kanagawa 

We  have  shown  mathematically  that  cortical  maps  in  the  primary  sensory  cortices  can  be  made  by 
using  three  hypotheses  which  do  not  conflict  with  physiological  experimental  results.  Here,  our  main 
focus  is  on  ocular  dominance  stripe  formation  in  the  primary  visual  cortex.  Monte  Carlo  simulations 
on  the  segregation  of  ipsilateral  and  contralateral  afferent  terminals  are  carded  out.  Based  on  these, 
almost  all  the  physiological  experimental  results  concerning  the  ocular  dominance  stdpes  of  cats  and 
monkeys  reared  under  normal  or  various  abnormal  conditions  can  be  explained  from  a  viewpoint  of 
the  critical  phenomena. 

P2.9  A  BIFURCATION  THEORY  APPROACH  TO  THE  PROGRAMMING  OF  PERIODIC 
ATTRACTORS  IN  NETWORKS  MODELS  OF  OLFACTORY  CORTEX 

BILL  BAIRD,  Department  of  Biophysics,  University  of  California,  Berkeley 

Analytic  methods  of  bifurcation  theory  are  used  to  design  algorithms  for  de  terming  synaptic  weights 
in  various  network  models  of  olfactory  bulb  and  prepyriform  cortex.  The  result  Is  memory  storage  of 
the  kind  of  oscillating  spatial  patterns  that  appear  in  the  bulb  during  Inspiration  and  suffice  to  predict 
the  pattern  recognition  behavior  of  rabbits  in  classical  conditioning  experiments.  These  attractors 
arise  during  simulated  inspiration  through  a  multiple  Hopf  bifurcation  which  acts  as  a  critical  'decision 
point’  for  their  selection  by  the  input  pattern.  Basin  boundaries  may  also  be  programmed,  and  the 
location  of  secondary  bifurcations  which  introduce  new  attractors  can  be  controlled. 

P2.10  NEURONAL  CARTOGRAPHY:  POPULATION  CODING  AND  RESOLUTION 
ENHANCEMENT  THROUGH  ARRAYS  OF  BROADLY  TUNED  CELLS 

PIERRE  BALDI,  Department  of  Mathematics:  WALTER  HEILIGENBERG,  Neurobiology  Unit,  SIO, 
University  of  California,  San  Diego 

We  investigate  population  coding  and  resolution  enhancement  in  sensor!  and  motor  maps.  Starting 
from  a  few  biological  examples,  we  consider  a  simple  model  consisting  of  a  one-dimensional  array  of 
evenly  spaced  ceils  with  bell  shaped  turning  curves.  These  cells  provide  inputs  proportional  to  their 
degree  of  excitation  as  well  as  to  their  rank  within  the  array  so  that  the  overall  response  can  be  taken 
to  be  of  the  form:  f(x)  -Ik  exp{-(x-k)2/d].  We  show  that  as  d  is  increased,  f  approaches  a  linear 
function  in  a  very  rapid  and  robust  fashion  and  that,  the  wider  the  tuning  curves,  the  more  precise  is 
the  overall  computation.  We  study  the  effect  of  different  types  of  noise  and  boundary  conditions  on 
the  model  together  with  several  of  its  extensions:  "mexican  hat"  response  curves,  2-D  arrays, ...  and 
show  that  its  basic  features  are  always  preserved.  We  examine  its  potential  engineering  applications 
and  biological  plausibility:  ontogeny,  dynamic  range,  eye  motor  control,  cochlea,  ...  and  discuss  its 
limitations. 
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P2.1t  LEARNING  THE  SOLUTION  TO  THE  APERTURE  PROBLEM  FOR  PATTERN  MOTION 
WITH  A  HEBB  RULE 

MARTIN  I.  SERENO,  Division  of  Biology,  California  Institute  of  Technology 

The  primate  visual  system  learns  to  recognize  the  true  direction  of  pattern  motion  using  local 
detectors  only  capable  of  detecting  the  component  of  motion  perpendicular  to  the  orientation  of  the 
moving  edge.  A  multilayer  feedforward  network  model  similar  to  Linsker's  model  was  presented  with 
input  patterns  each  consisting  of  randomly  oriented  contours  moving  in  a  particular  direction.  Input 
layer  units  have  component  direction  and  speed  tuning  curves  similar  to  those  recorded  from  neurons 
in  primate  visual  area  VI  that  project  to  MT.  The  network  is  trained  on  many  such  patterns  until  most 
weights  saturate.  A  proportion  of  the  units  in  the  second  and  higher  layers  solve  the  aperture 
problem  (e.g.,  show  the  same  direction-tuning  curve  peak  to  plaids  as  to  gratings)  resembling  pattern- 
direction  selective  neurons  in  primate  visual  cortex,  which  first  appear  in  area  MT. 

P2.12  A  MODEL  FOR  NEURAL  DIRECTIONAL  SELECTIVITY  THAT  EXHIBITS  ROBUST 
DIRECTION  OF  MOTION  COMPUTATION 

NORBERTO  M.  GRZYWACZ,  FRANKLIN  R.  AMTHOR,  Center  for  Biological  Information  Processing, 
Whitaker  College,  Cambridge,  MA 

Directionally  selective  retinal  ganglion  cells  discriminate  direction  of  visual  motion  relatively 
independently  of  speed  (Amthor  and  Qrzywacz,  1988a)  and  contrast.  An  asymmetric  distribution  of 
nonlinear  inhibition  around  each  point  of  the  receptive  field  generates  a  directional  selectivity  that  is 
computed  multiple  times  in  parallel  over  the  field  (Bartow  and  Levick,  1965).  We  propose  a 
directional  selectivity  model  based  on  our  recent  data  on  this  inhibition's  spatio-temporal  and 
nonlinear  properties.  The  main  prediction  of  this  model  is  the  robust  computation  of  visual  motion 
direction.  This  robustness  means  that  although  a  cell  response  depends  on  speed  and  contrast,  the 
ratio  of  responses  of  cells  having  different  preferred  directions  Is  independent.  This  suggests  that  the 
spatio-temporal  properties  of  retinal  directionally  selective  cells  are  particularly  well  adapted  to  motion 
computations 

P2.13  A  LOW-POWER  CMOS  CIRCUIT  WHICH  EMULATES  TEMPORAL  ELECTRICAL 
PROPERTIES  OF  NEURONS 

JACK  MEADOR,  CLINT  COLE,  Department  of  Electrical  and  Computer  Engineering,  Washington 
State  University,  Pullman,  WA 

This  paper  describes  a  CMOS  artificial  neuron.  The  circuit  is  directly  derived  from  the  voltage-gated 
channel  model  of  neural  membrane,  has  low  power  dissipation,  and  small  layout  geometry.  The 
principal  motivations  behind  this  work  include  a  desire  for  high  performance,  more  accurate  neuron 
emulation,  and  the  need  for  higher  density  in  practical  neural  network  implementations. 

P2.14  A  GENERAL  PURPOSE  NEURAL  NETWORK  SIMULATOR  FOR  IMPLEMENTING 
REALISTIC  MODELS  OF  NEURAL  CIRCUITS 

M.A.  WILSON,  U  S.  BHALLA,  J.D.  UHLEY,  J.M  BOWER.  Division  of  Biology,  California  Institute  of 
Technology 

To  facilitate  the  design  of  detailed,  realistic  biologically-based  models  we  have  developed  a 
graphically-oriented  general  purpose  network  simulator  designed  to  support  simulations  ranging  from 
detailed  single  cell  models  to  large  networks  of  simple  or  complex  cells.  Current  models  developed 
under  this  system  include  mammalian  olfactory  bulb  and  cortex,  invertebrate  central  pattern 
generators,  as  well  as  more  abstracted  connections  level  simulations. 
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P2.15  TRAINING  A  3-NODE  NEURAL  NETWORK  IS  NP-COMPLETE 

AVRIM  BLUM,  RONALD  L.  RIVEST,  Laboratory  for  Computer  Science,  Massachusetts  Institute  of 
Technology 

We  consider  a  simple  2-layer,  3-node,  rvinput  neural  network  whose  nodes  compute  linear  threshold 
functions  of  their  inputs.  We  show  that  it  is  NP-Complete  to  decide  whether  there  exist  functions  for 
the  nodes  of  this  network  so  that  it  will  produce  output  consistent  with  a  given  set  of  O(n)  training 
examples.  We  show  NP-Completeness  by  reduction  from  Set-Splitting. 

Our  proof  involves  translating  the  learning  problem  into  a  geometrical  setting.  An  equivalent 
statement  of  our  result  is  that  it  is  NP-Complete  to  decide  whether  two  sets  of  boolean  vectors  in 
n-dimensional  space  can  be  linearty  separated  by  two  hyperplanes. 

It  is  left  as  an  open  problem  to  extend  our  results  to  nodes  with  non-linear  functions  such  as 
slgmoids. 

P2.16  A  MASSIVELY  PARALLEL  SELF-TUNING  CONTEXT-FREE  PARSER 

EUGENE  SANTOS,  JR.,  Department  of  Computer  Science,  Brown  University 

The  Parsing  and  Learning  System  (PALS)  is  a  massively  parallel  self-tuning  context-free  parser.  It  is 
capable  of  parsing  sentences  of  unbounded  length  mainly  due  to  its  parse-tree  representation 
scheme.  The  system  is  capable  of  improving  its  parsing  performance  through  the  presentation  of 
training  examples. 

P2.17  A  BACK-PROPAGATION  ALGORITHM  WITH  OPTIMAL  USE  OF  HIDDEN  UNITS 
YVES  CHAUVIN,  Thomson  CSF,  Inc.  and  Stanford  University 

This  paper  presents  a  variation  of  the  back-propagation  algorithm  that  makes  optimal  use  of  hidden 
units  by  decreasing  an  "energy"  term  written  as  a  function  of  the  squared  activations  of  these  hidden 
units.  The  algorithm  can  (1)  automatically  find  nearly  optimal  architectures  necessary  to  solve  known 
Boolean  functions  (2)  facilitate  the  interpretation  of  the  activation  of  the  remaining  hidden  units  (3) 
eliminate  (0,0)  local  minimum  while  preserving  the  much  faster  convergence  of  the  (-1.+1)  logistic 
activation  function  and  (4)  automatically  estimate  the  complexity  of  architectures  appropriate  for 
phonetic  labeling  problems. 

P2.18  ANALYZING  THE  ENERGY  LANDSCAPES  OF  DISTRIBUTED  WINNER-TAKE-ALL 
NETWORKS 

DAVID  S.  TOURETZKY,  Computer  Science  Department,  Carnegie  Mellon  University 

When  winner-take-all  networks  appear  as  components  of  larger  connectionist  systems,  one  must 
make  compromises  in  setting  their  thresholds  to  ensure  good  overall  system  behavior.  This  proved  to 
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be  a  significant  problem  when  designing  DCPS,  Touretzky  and  Hinton’s  distributed  connections 
production  system.  An  analysis  of  the  shape  of  the  energy  landscape  explains  why  one  solution 
eventually  worked,  and  suggests  an  interesting  alternative.  The  first  approach,  rebiasing,  was 
employed  in  DCPS  after  the  network  had  settled  to  reject  choices  in  winner-take-all  spaces  that  were 
not  supported  by  adequate  evidence.  The  second  approach  enables  higher  thresholds  to  be  used  by 
clipping  the  comers  of  the  state-space  hypercube  to  keep  the  model  from  turning  all  its  units  off 
during  the  annealing  search. 

2.19  DYNAMIC,  NON-LOCAL  ROLE  BINDINGS  AND  INFERENCING  IN  A  LOCALIST 
NETWORK  FOR  NATURAL  LANGUAGE  UNDERSTANDING 

TRENT  E.  LANGE.  MICHAEL  G.  DYER,  Artificial  Intelligence  Laboratory,  Computer  Science 
Department,  University  of  California,  Los  Angeles 

The  inability  of  previous  distributed  and  localist  networks  to  robustly  handle  non-local  role-bindings 
has  limited  their  usage  in  higher-level  natural  language  understanding  systems.  This  paper 
introduces  a  means  to  handle  the  critical  problem  of  non-local  role-bindings  in  localist  spreading- 
activation  networks. 

Every  conceptual  node  in  the  network  has  associated  with  it  an  identification  node  broadcasting  a 
constant,  uniquely-identifying  activation,  called  its  signature.  Dynamic  role-bindings  are  represented 
with  nodes  whose  activations  match  the  signatures  of  the  bound  concepts.  Most  importantly,  the 
model  passes  these  signatures,  as  activation,  across  long  paths  of  nodes  to  handle  the  non-local 
role-bindings  necessary  for  inferencing.  We  claim  that  role-bindings  can  be  plausibly  represented 
with  groups  of  pacemaker  neurons  encoding  these  signature  activations. 

Using  these  abilities,  our  localist  network  model  is  able  to  robustly  represent  schemata  role-bindings 
and  thus  perform  the  inferencing,  plan/goal  analysis,  schema  instantiation,  word-sense 
disambiguation,  and  dynamic  re-interpretation  portions  of  the  natural  language  understanding 
process 

P2.20  SPREADING  ACTIVATION  OVER  DISTRIBUTED  MICROFEATURES 

JAMES  HENDLER,  Department  of  Computer  Science,  University  of  Maryland,  College  Park 

In  this  paper  we  demonstrate  that  an  activation  spreading  mechanism  can  be  used  to  probe  the 
internal  representations  built  by  a  distributed  connectionist  learning  algorithm.  We  demonstrate  that  a 
variant  of  marker-passing  can  be  used  to  perform  symbolic  inferencing  types  of  behavior,  in  the 
absence  of  asymbolic  model,  when  activation  is  spread  through  the  weight  space  learned  by  a  back- 
propagation  algorithm.  These  sorts  of  inferences,  previously  made  only  by  traditional  Al 
representations  and  structured  connectionist  networks,  are  necessary  for  providing  distributed 
networks  with  an  ability  to  do  the  "subsymbollc  inferencing’  necessary  for  cognitive  modeling. 

P2.21  SHORT-TERM  MEMORY  AS  A  METASTABLE  STATE:  A  MODEL  OF  NEURAL 
OSCILLATOR  FOR  A  UNIFIED  SUBMODULE 

A.B.  KIRILLOV,  G.N.  BORISYUK,  R.M.  BORISYUK,  Ye.l.  KOVALENKO,  V  I.  KRYUKOV,  V  I. 
MAKARENKO,  V.A.  CHULAEVSKY,  Research  Computer  Center  of  the  USSR  Academy  of  Sciences, 
Pushchino,  Moscow  Region 

A  new  model  of  a  controlled  neuron  oscillator,  proposed  earlier  for  the  interpretation  of  the  neural 
activity  in  various  parts  of  the  central  nervous  system,  may  have  important  applications  in  engineering 
and  in  the  theory  of  brain  functions.  The  oscillator  has  a  good  stability  of  the  oscillation  period,  its 
frequency  is  regulated  linearly  in  a  wide  range  and  it  can  exhibit  arbitrarily  long  oscillation  periods 
without  changing  the  time  constants  of  its  elements.  The  latter  Is  achieved  by  using  the  critical 
slowdown  in  the  dynamics  arising  in  a  network  of  nonformal  excitatory  neurons.  By  changing  the 
parameters  of  the  oscillator  one  can  obtain  various  functional  modes  which  are  necessary  to  develop 
a  model  of  higher  brain  function. 
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P2.22  STATISTICAL  PREDICTION  WITH  KANERVA'S  SPARSE  DISTRIBUTED  MEMORY 

DAVID  ROGERS,  Research  Institute  for  Advanced  Computer  Science,  NASA  Ames  Research  Center 

A  new  viewpoint  of  the  processing  performed  by  Kanerva's  sparse  distributed  memory  (SDM)  is 
presented.  In  conditions  of  near-  or  over-capacity,  where  the  associative-memory  behavior  of  the 
model  breaks  down,  the  processing  performed  by  the  model  can  be  interpreted  as  that  of  a  statistical 
predictor.  Mathematical  results  are  presented  which  serve  as  the  framework  for  a  new  statistical 
viewpoint  of  the  processing  done  by  a  SDM,  and  for  which  the  standard  formulation  of  SDM  is  a 
special  case.  This  viewpoint  suggests  possible  enhancements  to  the  SDM  model,  including  a 
procedure  for  improving  the  predictiveness  of  the  system,  based  on  Holland’s  work  with  'Genetic 
Algorithms,"  and  a  method  for  improving  the  capacity  of  SDM  even  when  used  as  an  associative 
memory. 

P2.23  IMAGE  RESTORATION  BY  MEAN  FIELD  ANNEALING 

G.L.  BILBRO,  W.E.  SNYDER,  Department  of  Electrical  and  Computer  Engineering,  North  Carolina 
State  University,  Raleigh 

Minimization  by  stochastic  simulated  annealing  (SSA)  has  been  used  successfully  by  a  number  of 
authors  to  minimize  functions  of  many  variables,  even  in  the  presence  of  local  minima.  In  this  paper, 
a  new  minimization  strategy  is  formulated  in  which  the  Markov  random  process  of  SSA  is  replaced  by 
a  deterministic  minimization  step  followed  by  annealing.  Experiments  have  indicated  speedups  of  1-2 
orders  of  magnitude  using  this  new  mean  field  annealing  (MFA)  over  SSA  implementations  of  the 
same  problem. 

An  abbreviated  derivation  of  the  MFA  strategy  is  presented.  Then,  a  particular  objective  functions  is 
presented,  one  which  minimizes  the  noise  in  an  image  while  still  preserving  edges.  The  objective 
function  is  cast  as  an  MFA  problem  and  solved.  Experimental  results  are  presented  In  which  noisy 
images  are  restored  with  sufficient  accuracy  to  allow  good  estimates  of  second  derivatives. 
(-  avorable  comparisons  are  made  with  techniques  previously  reported  in  the  literature.  Restorations 
of  very  coarsely  sampled  images  (16x16)  are  also  presented,  in  which  the  noise  is  removed  without 
distorting  the  edges. 

The  application  of  MFA  to  image  restoration  may  be  implemented  on  a  locally-connected  neural  net. 
Implementation  issues  are  presented. 

P2.24  AUTOMATIC  LOCAL  ANNEALING 

JARED  LEINBACH,  Department  of  Psychology,  Carnegie  Mellon  University 

This  research  involves  a  method  for  finding  global  maxima  in  constrain  satisfaction  networks.  It  is  an 
annealing  process  but,  unlike  the  Boltzmann  Machine,  requires  no  annealing  schedule.  Units 
determine  their  temperature  at  each  update  based  solely  on  information  local  to  them,  and  thus  all 
processing  is  done  at  the  unit  levei.  The  method  outperforms  the  Boltzmann  machine  in  two 
fundamental  ways:  1 )  Global  maxima  are  found  more  quickly,  and  2)  the  probability  of  having  found  a 
global  maximum  always  approaches  1  as  the  number  of  cycles  of  processing  increases  (for  the 
Boltzmann  Machine  the  probability  of  having  found  a  global  maximum  stops  increasing,  at  some 
value  less  than  1,  when  the  temperature  reaches  zero).  Implementation  of  this  method  is  also 
computationally  trivial. 

P2.25  NEURAL  NETWORKS  FOR  MODEL  MATCHING  AND  PERCEPTUAL  ORGANIZATION 

ERIC  MJOLSNESS,  P.  ANANDAN,  Department  of  Computer  Science;  GENE  GINDI,  Department  of 
Electrical  Engineering,  Yale  University 

We  introduce  an  optimization  approach  for  solving  problems  in  computer  vision  that  involve  multiple 
levels  of  abstraction.  Specifically,  our  objective  functions  can  include  compositional  hierarchies 
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involving  object-part  relationships  and  specialization  hierarchies  involving  object-class  relationships. 
The  large  class  of  vision  problems  that  can  be  subsumed  by  this  method  includes  traditional  model 
matching,  perceptual  grouping,  dense  field  computation  (regularization),  and  even  early  feature 
detection  which  is  often  formulated  as  a  simple  filtering  operation.  Our  approach  involves  casting  a 
variety  of  vision  problems  as  inexact  graph  matching  problems,  formulating  graph  matching  in  terms 
of  constrained  optimization,  and  using  analog  neural  networks  to  perform  the  constrained 
optimization.  We  will  show  the  application  of  this  approach  to  shape  recognition  in  a  domain  of 
stick-figures  and  to  the  perceptual  grouping  of  line  segments  into  long  lines. 

P2.26  ON  THE  K-WINNERS-TAKE-ALL  FEEDBACK  NETWORK  AND  APPLICATIONS 

ERIC  MAJANI,  RUTH  ERLANSON,  YASER  ABU-MOSTAFA,  Jet  Propulsion  Laboratory,  California 
Institute  of  Technology 

We  present  a  rigorous  analysis  of  the  k-Winners-Take-AII  Feedback  Network.  We  show  that  the 
slope  G  of  the  sigmoid  at  the  origin  has  to  be  above  a  value  of  at  least  (N  -  1 )  for  the  network  to 
function  properly  (N  is  the  number  of  nodes).  In  the  limit  of  an  infinite  slope,  we  show  that  the  only 
stable  states  of  the  network  are  the  vectors  with  k  (+1)'s  and  (N  -  k)  (-l)’s,  and  that  the  convergence 
towards  the  stable  state  occurs  in  a  maximum  likelihood  fashion.  Finally,  we  use  these  networks  for 
the  soft  decision  decoding  of  Simplex  Codes  as  well  as  Associative  Memories. 

P2.27  AN  ADAPTIVE  NETWORK  THAT  LEARNS  SEQUENCES  OF  TRANSITIONS 

C.L.  WINTER,  Science  Applications  International  Corp.,  Tucson,  A Z 

We  describe  an  adaptive  network  (TIN-2)  that  learns  the  transition  function  of  an  arbitrary  finite-state 
automaton  from  observations  of  its  real-time  behavior.  During  training  it  abstracts  transition  functions 
from  noisy  data,  while  in  operation  it  produces  sequences  of  transitions  in  response  to  variations  in 
input.  Memory  dynamics  are  based  on  a  modified  version  of  Adaptive  Resonance  Theory.  Individual 
F2  nodes  leam  to  recognize  unique  current  state/next  state  associations  and  all  external  inputs  which 
have  evoked  them.  We  give  results  from  experiments  in  which  TIN-2  teams  to  balance  parentheses 
in  simple  algebraic  expressions  from  example  expressions. 

P2.28  CONVERGENCE  AND  PATTERN-STABILIZATION  IN  THE  BOLTZMANN  MACHINE 

MOSHE  KAM,  Department  of  Electrical  and  Computer  Engineering,  Drexel  University;  ROGER 
CHENG,  Department  of  Electrical  Engineering,  Princeton  University 

The  most  common  application  of  the  Boltzmann  Machine  is  in  global  optimization  with  multimodal 
objective  functions  through  the  employment  of  simulated  annealing.  When  operating  at  a  constant 
temperature,  the  machine  could  be  used  for  unambiguous  associative  pattern  retrieval,  through 
exploitation  of  its  ability  to  escape  from  local  minima.  Through  a  teaming  algorithm,  a  set  of  known 
codewords  is  installed  in  the  network’s  state  space  as  a  set  of  local  minima.  An  appropriate  proximity 
criterion  is  used  to  associate  any  binary  tuple  that  comes  close  enough  to  a  stored  codeword  with  this 
codeword,  effectively  creating  regions  of  attraction  around  each  taught  binary  pattern.  Spurious  local 
minima,  which  become  non-interpretable  "traps’  in  the  asynchronous  deterministic  model  are 
skipped,  and  their  effect  on  information  retrieval  is  demonstrated  only  in  delaying  the  machine  in  the 
(usually  shallow)  spurious  "valleys’  in  the  energy  landscape  before  moving  towards  a  "legal" 
interpretable  minimum. 

We  formulate  the  Hamming  distance  from  a  stored  pattern  of  a  dynamic  Boltzmann  machine  as  a 
birth-and-death  Markov  chain,  and  find  limits  on  the  error-correcting  capabilities  of  the  resulting 
content-addressable  memory  in  terms  of  retrieval  probabilities  and  retrieval  time.  Steady  state 
partition  of  die  memory  is  studied  through  the  process'  limit  state  probabilities.  In  passing,  we 
examine  the  rote  of  the  incremental  Hebblan  rule  as  a  learning  scheme  for  the  machine,  and  interpret 
is  as  a  steepest-descent  algorithm,  maximizing  pattern  stabilization  during  training.  The  results  apply 
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to  the  assessment  of  coding  efficiency  for  representing  information  to-be-stored,  and  to  quantifying 
learning  algorithms  and  association  rules  for  both  the  Boltzmann  machine  and  the  asynchronous  net 
of  binary  threshold  elements. 
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POSTER  SESSION  P2C 
IMPLEMENTATION 


P2.29  MOS  CHARGE  STORAGE  OF  ADAPTIVE  NETWORKS 
R.E.  HOWARD,  D.B.  SCHWARTZ,  AT&T  Bell  Laboratories,  Holmdel,  NJ 

We  have  developed  fully  analog  adaptive  network  chips  which  store  the  weights  as  charge  upon 
MOS  capacitors.  The  weights  are  changed  by  moving  charge  between  a  pair  of  capacitors  with  a 
string  of  charge  transfer  transistors  that  mimic  CCD's.  The  charge  transfer  mechanism  provides  a 
resolution  of  8  bits  plus  sign  and  implements  weight  decay  simply.  This  resolution  can  be  held  for  at 
least  20  seconds  at  room  temperature,  allowing  ample  time  for  refresh  and  can  be  held  indefinitely  if 
the  chips  are  cooled.  A  2.5p  CMOS  chip  with  128  weights  and  separate  test  structures  has  been 
tested  and  testing  has  begun  on  a  1 ,25p  version  with  1104  weights. 

P2.30  A  SELF-LEARNING  NEURAL  NETWORK 

A.  HARTSTEIN,  R.H.  KOCH,  IBM  T.J.  Watson  Research  Center,  Yorktown  Heights,  NY 

We  propose  a  new  neural  network  structure  that  is  compatible  with  silicon  technology  and  has  built  in 
learning  capability.  This  network  has  the  feature  that  the  learning  parameter  is  embodied  in  the 
thresholds  of  MOSFET  devices  and  is  local  in  character.  The  network  is  shown  to  be  capable  of 
learning  by  example  as  well  as  exhibiting  the  desirable  features  of  the  Hopfield  type  networks. 

P2.31  AN  ANALOG  VLSI  CHIP  FOR  CUBIC  SPLINE  SURFACE  INTERPOLATION 

JOHN  G.  HARRIS,  Division  of  Computation  and  Neural  Systems,  California  Institute  of  Technology 

This  paper  describes  an  analog  VLSI  chip  for  smooth  surface  interpolation.  An  eight-node  ID 
network  was  designed  in  3um  CMOS  [Mead  1988].  Subtract  constraint  devices  and  a  resistor  mesh 
are  used  to  interpolate  a  dense  surface  from  sparse  depth  constraints  provided  by  a  stereo  module. 
The  cubic  spline  interpolant  used  by  this  chip  matches  the  results  of  psychophysics  experiments  with 
random  dot  stereograms. 

P2.32  ANALOG  IMPLEMENTATION  OF  SHUNTING  NEURAL  NETWORKS 

BAHRAM  NABET,  ROBERT  B.  DARLING,  ROBERT  B.  PINTER,  Department  of  Electrical 
Engineering,  University  of  Washington,  Seattle 

We  propose  an  extremely  compact,  all  analog  and  fully  parallel  implementation  of  a  shunting 
cooperative-competitive  recurrent  neural  network  that  is  applicable  to  a  wide  variety  of  FET-based 
integration  technologies.  While  the  contrast  enhancement,  data  compression,  and  adaptation  to 
mean  input  intensity  capabilities  of  the  network  are  well  suited  for  processing  of  sensory  information 
or  feature  extraction  for  a  content  addressable  memory  (CAM)  system,  the  network  also  admits  a 
global  Liapunov  function  and  can  thus  achieve  stable  CAM  storage  itself.  In  addition,  the  model  can 
readily  function  as  a  front-end  processor  to  an  analog  adaptive  resonance  circuit. 
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P2.33  STABILITY  OF  ANALOG  NEURAL  NETWORKS  WITH  TIME  DELAY 

C  M.  MARCUS,  R.M.  WESTERVELT,  Division  of  Applied  Sciences  and  Department  of  Physics, 
Harvard  University 

Analog  neural  networks  designed  to  converge  to  fixed  points  can  oscillate  when  time  delay  is  present. 
This  is  an  important  consideration  in  building  hardware  networks  where  switching  delay  can  be 
comparable  to  the  relaxation  time  of  the  circuit.  We  consider  stability  in  networks  of  saturable 
amplifiers  (neurons)  with  delayed  output.  Our  results  are  based  on  linear  stability  analysis  about  the 
fixed  point  where  all  neurons  have  maximum  gain.  We  focus  on  symmetrically  connected  networks 
which  are  stable  when  the  delay  is  zero,  and  show  that  above  a  critical  value  of  delay  an  attractor  for 
sustained  oscillation  appears.  Our  results  can  be  formulated  as  a  stability  criterion  that  depends  on 
the  size  and  relaxation  time  of  the  network,  the  connection  topology,  and  the  delay  and  gain  of  the 
neurons.  We  apply  the  stability  criterion  to  several  connection  topologies  and  show  that  the  most 
unstable  configuration  is  the  all-inhibitory  network  and  that  Hebb  rule  networks  are  stable  even  for 
large  delays.  We  also  consider  stability  in  random  symmetric  networks. 

Results  of  the  stability  analysis  agree  well  with  numerical  integration  of  the  delay  equations  and 
experiments  on  a  small  (8  neuron)  analog  network  that  includes  adjustafc  ®  time  delay  based  on 
charge  coupled  device  circuitry. 

P2.34  ANALOG  SUBTHRESHOLD  VLSI  CIRCUIT  FOR  INTERPOLATING  SPARSELY 
SAMPLED  2-D  SURFACES  USING  RESISTIVE  NETWORKS 

JIN  LUO,  CHRISTOF  KOCH.  CARVER  MEAD,  California  Institute  of  Technology 

Interpolating  and  smoothing  sparsely  sampled  and  noisy  surface  data  is  a  well-known  problem  In 
computer  vision  (Grimson,  1981).  It  can  be  shown  to  be  equivalent  to  minimizing  a  quadratic 
variational  functional.  This  functional  maps  onto  very  simple  resistive  networks,  such  that  the  steady 
state  voltage  distribution  corresponds  to  the  interpolated  and  smoothed  surface  (Koch,  Marroquin  and 
Yuille,  1986)  We  have  implemented  such  a  network  using  analog,  subthreshold  CMOS  VLSI 
technology  (Mead,  1988)  and  report  here  for  the  first  time  its  full  two-dimensional  operation  using  real 
data. 

P2.35  GENERAL  PURPOSE  NEURAL  ANALOG  COMPUTER 

PAUL  MUELLER,  JAN  VAN  DER  SPIEGEL,  DAVID  BLACKMAN,  JOE  DAO.  CHRIS  DONHAM,  ROY 
FURMAN,  DZU  PU  HSIEH,  MARC  LOINAZ,  Departments  of  Biochemistry  and  Biophysics  and 
Electrical  Engineering,  University  of  Pennsylvania 

We  have  designed  a  neural  analog  computer  and  are  in  the  process  of  fabricating  its  components. 
The  machine  is  assembled  from  separate  modules  consisting  of  neuron  arrays,  variable  gain  synaptic 
arrays,  and  switchable  axon  arrays.  The  machine  runs  entirely  in  analog  mode  but  all  parameters 
(neuron  thresholds  and  time  constants,  connections  and  synaptic  weights  are  under  digital  control). 

Each  neuron  has  a  limited  number  of  inputs  (128),  and  can  connect  to  any  other  neuron  but  not  every 
neuron  can  connect  to  every  other  neuron.  Connections  (via  axons),  neuron  parameters  (threshold, 
time  constants)  and  synaptic  gains  (weights)  are  set  from  digital  processors,  each  processor  serving 
a  section  of  neurons,  axons  and  synapses. 

Neuron  arrays  are  arranged  in  rows  and  columns  and  are  surrounded  by  synaptic  and  axon  arrays. 
For  determining  synaptic  weights  (learning  mode),  outputs  from  neurons  are  multiplexed.  A/D 
converted  and  stored  in  digital  memory. 

Learning  algorithms  or  connection  architectures  are  generated  by  a  central  digital  computer  that 
serves  each  section  processor.  Because  of  Its  modular  design  the  machine  can  be  expanded  to  any 
size. 
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P2.36  A  SILICON  BASED  PHOTORECEPTOR  SENSITIVE  TO  SMALL  CHANGES  IN  LIGHT 
INTENSITY 

C.A.  MEAD,  T.  DELBRUCK,  California  Institute  of  Technology 

We  describe  a  silicon-based  photoreceptor  circuit  that  is  sensitive  to  small  changes  in  the  incident 
light  intensity.  The  idea  for  the  circuit  came  from  a  suggestion  by  Frank  Werbfin  that  biological  retinas 
may  achieve  great  sensitivity  to  small  changes  in  incident  intensity  by  feeding  back  a  filtered  version 
of  the  output  signal.  A  comparison  between  measurements  of  temporal  contrast  sensitivity  for  the 
circuit  and  for  die  human  eye,  measured  psychophysically,  are  presented  and  it  is  shown  that  both 
obey  Weber’s  law  and  that  the  contrast  sensitivities  are  nearly  the  same  when  measured  in  units  of 
incident  intensity.  We  discuss  how  the  circuit  achieves  this  in  terms  of  the  small  signal  gain  control. 

P2.37  A  DIGITAL  REALISATION  OF  SELF-ORGANISING  MAPS 

M.J.  JOHNSON,  N.M.  ALLINSON,  K.  MOON,  Department  of  Electronics,  University  of  York,  England 

A  novel  digital  realization  for  a  self-organising  feature  map  is  proposed.  This  is  shown  to  be 
equivalent  to  the  analogue  form  and  has  been  used  to  create  very  large  feature  maps  for  image 
primitives.  A  256x256  feature  map  of  32x32  weight  elements  was  found  to  produce  an  adapted 
output  map  in  under  50ms  using  a  single  processor  running  at  four  MIPS,  and  with  a  memory 
requirement  of  approximately  two  Mbytes. 

P2.38  TRAINING  OF  A  LIMITED-INTERCONNECT,  SYNTHETIC  NEURAL  1C 

M.R.  WALKER,  L.A.  AKERS,  Center  for  Solid-State  Electronics  Research,  Arizona  State  University, 
Tempo 

This  paper  reports  on  the  development  of  a  non-aigorithmic  training  paradigm  for  a  limited- 
interconnect.  multi-layered  perceptron-like  network  implemented  using  standard  CMOS  design  rules. 
The  network  Is  isomorphic  to  fully-connected  layered  architectures,  but  satisfies  the  interconnection 
length  and  density  constraints  imposed  by  VLSI  technology.  The  network  is  composed  of  512 
compac*  analog  processing  elements,  each  of  which  modulates  inputs  with  analog  synaptic 
transmittance  values  and  then  sums  the  post-synaptic  signals  on  the  gate  of  a  CMOS  double  inverter. 
The  training  paradigm  is  a  modified  version  of  back-propagation  which  accommodates  the  binary- 
state  processing  elements  and  the  limited  range  of  synaptic  weight  values  imposed  by 
microelectronic  constraints.  Simulation  results  are  presented  which  demonstrate  the  ability  of  this 
paradigm  to  produce  a  network  which  is  fault  tolerant  and  capable  of  generalization. 

P2.39  ELECTRONIC  RECEPTORS  FOR  TACTILE  SENSING 

ANDREAS  G.  ANDREOU,  Department  of  Electrical  and  Computer  Engineering,  Johns  Hopkins 
University 

In  this  paper,  we  discuss  electronic  receptors  for  tactile  sensing.  These  are  based  on  magnetic  field 
sensors  both  Hall-effect  structures  and  magneto-transistors  fabricated  using  standard  CMOS 
technologies,  and  integrated  with  additional  electronics  to  perform  local  processing.  Integrated  arrays 
of  these  receptors  biased  with  a  small  permanent  magnet,  can  sense  the  local  distortion  in  the 
magnetic  field  due  to  paramagnetic  objects  near  the  surface  of  the  chip.  The  sensitivity,  spatial 
resolution  and  frequency  response  of  different  receptors  fabricated  through  MOSIS  will  be  discussed. 
The  above  performance  criteria  will  be  compared  with  the  characteristics  of  different  types  of 
biological  tactile  receptors. 
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P2.40  OPTICAL  EXPERIMENTS  IN  LEARNING,  COOPERATION,  AND  COMPETITION  WITH 
CONTINUOUS,  DYNAMIC  HOLOGRAPHIC  MEDIA 

JEFF  L.  ORREY,  MIKE  J.  O'CALLAGHAN,  PETER  J.  MARTIN,  DIANA  M.  LININGER,  DANA 
Z.  ANDERSON,  Joint  Institute  for  Laboratory  Astrophysics,  University  of  Colorado,  Boulder 

We  present  two  classes  of  optical  neural  networks.  In  an  electrooptic  version  we  demonstrate  delta 
rule  learning,  with  synaptic  weights  stored  and  updated  in  a  dynamic  volume  holographic  medium. 
Updating  consists  of  adjusting  the  diffraction  efficiency  of  the  hologram  via  a  liquid  crystal  spatial  light 
modulator  and  microcomputer.  For  an  all  optical  nonlinear  circuit  with  active  gain  and  loss  we 
discuss  competitive  and  cooperative  interactions  between  modes.  We  present  experimental  results 
illustrating  mode  dynamics,  including  bistability,  hysteresis  and  relaxation  oscillations. 
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ORAL  SESSION  04 
STRUCTURED  NETWORKS 


2:20  04.1  SYMBOL  PROCESSING  IN  THE  BRAIN 

GEOFFREY  HINTON,  Computer  Science  Department,  University  of  Toronto 
Invited  talk. 


3:00  04.2  IMPLICATIONS  OF  RECURSIVE  DISTRIBUTED  REPRESENTATIONS 

JORDAN  POLLACK,  New  Mexico  State  University,  Las  Cruces 

I  will  describe  my  recent  results  on  the  automatic  development  of  fixed-width  distributed 
representations  of  variable-sized  recursive  and  sequential  data  structures:  Recursive  Auto- 
Associative  Memory  (RAAM),  which  implements  Hinton’s  idea  of  reduced  descriptions. 

The  first  implication  of  this  work  is  that  certain  types  of  Al-style  data-structures  can  now  be 
represented  in  fixed-width  analog  vectors.  Simple  inferences  and  transformations  can  be  done 
quickly  by  the  type  of  pattern  associations  that  neural  networks  excel  at,  thus  avoiding  the 
combinatorial  inefficiencies  of  variables,  unification,  or  data-restructuring. 

The  second  implication  is  that  these  representations  must  become  self-similar  in  the  limit.  Once  this 
door  to  chaos  is  opened,  many  interesting  new  questions  about  intelligence  can  (and  will)  be 
discussed. 


3:30  04.3  LEARNING  SEQUENTIAL  STRUCTURE  IN  SIMPLE  RECURRENT  NETWORKS 

DAVID  SERVAN-SCHREIBER,  AXEL  CLEEREMANS,  JAMES  L.  MCCLELLAND,  Departments  of 
Computer  Science  and  Psychology,  Carnegie  Mellon  University 

This  paper  reports  a  study  of  learning  in  simple  recurrent  networks  previously  studied  by  Elman 
(1988).  In  these  networks,  the  pattern  of  activation  developed  on  a  hidden  layer  at  t-1  and  the  event 
that  occurred  at  t- 1  are  used  to  predict  the  event  that  will  occur  at  t.  We  trained  the  network  on  a  set 
of  letter  strings  of  restricted  length  from  a  simple  artificial  grammar.  After  training,  the  network  was 
able  to  predict  possible  successors  of  each  letter  in  the  training  set  and  generalized  well  to  other 
strings  conforming  to  the  grammar  and  length  restrictions.  Cluster  analyses  of  the  hidden  unit 
patterns  showed  that  they  encode  prediction-relevant  information  about  the  path  traversed  through 
the  grammar.  We  provide  a  description  of  the  different  phases  of  learning,  illustrated  with  cluster 
analyses,  and  we  note  some  conditions  under  which  the  simple  recurrent  network  will  fail  to  master  a 
set  of  training  sequences. 
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4:00  BREAK 


4:20  04.4  SHORT-TERM  MEMORY  AS  A  METASTABLE  STATE:  "NEUROLOCATOR,"  A  MODEL 

OF  ATTENTION 

V  I.  KRYUKOV,  Research  Computer  Center,  USSR  Academy  of  Sciences,  Pushchino,  Moscow 
Region 

A  most  important  consequence  of  our  theory  of  phase  transitions  in  the  brain  is  the  predictions  that 
CNS  contains  a  phase-locked  tracking  system  for  controlling  attention  and  memory  in  the  frequency 
range  of  alpha-  and  theta-rhythms.  This  paper  describes  a  simplified  model  of  such  a  system  and 
derives  a  basic  Integro-dlfferential  equation  for  its  functioning  which  is  almost  identical  to  the  equation 
for  the  well-known  in  communication  phase-locked  loop  (PLL).  Dynamical  properties  of  this  system 
are  shortly  discussed  to  account  for  the  experimental  data  which  are  difficult  to  interpret  in  terms  of 
the  existing  models. 


4:50  04.5  HETEROGENEOUS  NEURAL  NETWORKS  FOR  ADAPTIVE  BEHAVIOR  IN  DYNAMIC 

ENVIRONMENTS 

RANDALL  D.  BEER.  LEON  S.  STERLING,  Departments  of  Computer  Engineering  and  Science  and 
Center  for  Automation  and  Intelligent  Systems  Research;  HILLEL  J  CHIEL,  Department  of  Biology 
and  Center  for  Automation  and  Intelligent  Systems  Research,  Case  Western  Reserve  University 

Recent  research  in  artificial  neural  networks  has  generally  focused  on  uniform  architectures,  i.e., 
homogeneous  networks  consisting  of  simple  units  with  a  regular  interconnection  scheme.  In  contrast, 
even  simple  biological  neural  networks  exhibit  great  heterogeneity  in  both  their  elements  and  their 
patterns  of  interconnection.  We  argue  for  heterogeneity  in  artificial  neural  networks  by  describing  a 
simple  heterogeneous  artificial  neural  network  for  controlling  the  walking  of  a  six-legged  "organism*  in 
a  simulated  environment.  This  controller  is  based  on  the  design  of  neural  networks  found  in 
biological  organisms  and  is  capable  of  adapting  to  traumatic  changes,  such  as  the  removal  of  a  leg, 
as  a  natural  consequence  of  its  design. 


5:20  04.6  A  LINK  BETWEEN  MARKOV  MODELS  AND  MULTILAYER  PERCEPTRONS 

H.  BOURLARD,  C.J.  WELLEKENS,  Philips  Research  Laboratory,  Brussels 

In  the  Hidden  Markov  Models,  commonly  used  for  speech  recognition,  local  probabilities  are 
associated  with  states  or  with  transitions  between  states.  The  incorporation  of  contextual  information 
or  discriminant  properties  in  these  probabilities  heavily  complicates  the  training  algorithm  of  the 
models.  This  problem  is  circumvented  by  using  a  Multilayer  Perceptron  with  feedback  to  generate 
highly  discriminant  and  largely  context  dependent  probabilities. 


6:30  RECEPTION  (CASH  BAR) 


7:30  CONFERENCE  BANQUET 


9:00  PLENARY  SPEAKER:  NEURAL  ARCHITECTURE  AND  FUNCTION 

VALENTINO  BRAfTENBERG,  Max  Planck  Insfltut  fur  Biologische  Kybemetik,  West  Germany 
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ORAL  SESSION  05 
IMPLEMENTATION 


8:30  05.1  ROBOTICS,  MODULARITY,  AND  LEARNING 

RODNEY  BROOKS,  Artificial  Intelligence  Laboratory,  Massachusetts  Institute  of  Technology 
Invited  talk. 


9:10  05.2  WINNER-TAKE-ALL  NETWORKS  OF  0(N)  COMPLEXITY 

J.  LAZZARO.  S.  RYCKEBUSCH,  M.A.  MAHOWALD,  C.A.  MEAD,  Computational  Neural  Systems 
Program.  California  Institute  of  Technology 

Activity  in  neural  systems  is  mediated  by  two  general  types  of  inhibition:  subtractive  inhibition,  which 
may  be  thought  of  as  setting  the  zero  level  for  the  computation,  and  multiplicative  (non- linear) 
inhibition  which  regulates  the  gain  of  the  computation.  We  report  a  physical  realization  of  general 
nonlinear  inhibition  in  its  extreme  form,  known  as  winner-take-all.  We  have  designed,  fabricated,  and 
tested  a  series  of  compact  CMOS  integrated  circuits  which  realize  the  winner-take-all  function.  These 
analog,  continuous-time  circuits  use  only  0(n)  of  interconnect  to  perform  this  function. 

We  have  also  modified  this  global  winner-take-all  circuit,  realizing  a  circuit  that  computes  local 
nonlinear  inhibition.  Local  inhibitory  circuits  are  well  suited  for  use  in  systems  which  topographically 
represent  a  feature  space  and  which  process  several  features  in  parallel.  We  have  designed, 
fabricated,  and  tested  a  CMOS  integrated  circuit  which  combines  the  function  of  the  winner-take-all 
circuit  anckft  nonlinear  resistive  network  to  locally  compute  the  winner-take-all  function  of  spatially 
ordered  input.  The  circuit  is  composed  of  a  one  dimensional  array  of  elements,  which  interact  with 
nonlinear  lateral  inhibition.  Since  the  competitive  interactions  are  local,  multiple  winners  can  occur 
within  the  array. 


9:40  05.3  AN  ANALOG  SELF-ORGANIZING  NEURAL  NETWORK  CHIP 

J.  MANN,  S.  GILBERT,  Massachusetts  Institute  of  Technology  Lincoln  Laboratory 

A  design  for  a  fully  analog  version  of  a  seif-organizing  feature  map  neural  network  has  been 
completed.  Several  parts  of  this  design  are  in  fabrication.  The  feature  map  algorithm  was  modified  to 
accommodate  circuit  solutions  to  the  various  computations  required.  Performance  effects  were 
measured  by  simulating  the  design  as  part  of  a  frontend  for  a  speech  recognition  system.  Circuits  are 
included  to  implement  both  activation  computations  and  weight  adaptation  or  learning.  External 
access  to  the  analog  weight  values  is  provided  to  facilitate  weight  initialization,  testing  and  static 
storage.  This  fully  analog  implementation  requires  an  order  of  magnitude  less  area  than  a 
comparable  digital/analog  hybrid  version  developed  earlier. 
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05.4  PERFORMANCE  OF  A  STOCHASTIC  LEARNING  MICROCHIP 

JOSHUA  ALSPECTOR,  BHUSAN  GUPTA*.  ROBERT  B.  ALLEN,  Bellcore.  Morristown.  NJ 
(•permanently  at  Department  of  Electrical  Engineering,  University  of  California,  Berkeley) 

We  have  fabricated  a  test  chip  in  2  micron  CMOS  that  can  test  the  function  of  an  electronic  parallel 
network  meant  to  perform  supervised  learning  in  a  manner  similar  to  the  Boltzmann  machine.  The 
function  of  the  chip  components  are  explained  and  the  performance  is  assessed.  The  chip  learns  to 
solve  the  XOR  problem  in  a  few  milliseconds.  Future  plans  for  scaling  the  circuit  up  to  useful  size  are 
discussed. 


05.5  A  FAST,  NEW  SYNAPTIC  MATRIX  FOR  OPTICALLY  PROGRAMMED  NEURAL 
NETWORKS 

C.D.  KORNFELD,  R.C.  FRYE,  C.C.  WONG,  E.A.  RIETMAN,  AT&T  Ben  Laboratories.  Murray  HHI,  NJ 

We  report  on  the  design,  construction  and  operation  of  a  large,  optically  programmed  neural  network 
which  uses  a  new  synapse  structure  that  has  substantially  improved  operating  characteristics  than 
those  reported  in  our  earlier  papers.  These  synaptic  arrays  are  somewhat  more  difficult  to  fabricate 
than  our  early  devices  because  they  require  additional  photolithography  steps  and  use  a  dielectric 
isolation  layer.  The  resulting  arrays  have  symmetric  behavior  for  both  activating  and  inhibiting 
synapse  types.  They  also  exhibit  linear  response  to  changes  in  applied  voltage.  Preliminary 
measurements  indicate  that  these  devices  are  nearly  1000  times  faster  than  our  earlier  devices. 

In  this  paper  we  will  describe  these  new  devices  and  will  compare  them  to  earlier  designs.  We  w« 
also  discuss  tradeoffs  that  can  be  made  in  the  choice  of  materials  and  geometries  used  In  these 
arrays.  Next,  we  wilt  describe  how  these  tradeoffs  impact  system  configurations  and  potential 
applications  Finally,  we  will  describe  a  complete  neural  network  implementation  using  these  new 
arrays. 


05.6  PROGRAMMABLE  ANALOG  PULSE-FIRING  NEURAL  NETWORKS 

ALAN  F.  MUPRAY,  Department  of  Electrical  Engineering,  University  of  Edinburgh,  Scotland;  LIONEL 
TARASSENKO,  Department  of  Engineering  Science,  University  of  Oxford,  England;  ALISTER 
HAMILTON,  Department  of  Electrical  Engineering,  Napier  College  of  Commerce  and  Technology, 
Edinburgh,  Scotland 

We  describe  pulse  -  stream  firing  VLSI  devices,  supported  by  digital,  on-chlp  memory  for  synaptic 
weights,  that  form  asynchronous,  essentially  analog  neural  networks.  Synaptic  weights  are  held  in 
off-chip  digital  RAM,  and  used  to  charge  on-chip  dynamic  analog  storage  capacitors  through  a  digital- 
to-anaiog  converter.  Synaptic  weighting  uses  time-division  of  the  neural  pulses  from  a  signalling 
neuron  to  a  receiving  neuron.  MOS  transistors  in  their  *ON*  state  act  as  variable  resistors  to  control  a 
capacitive  discharge,  and  time-division  is  thus  achieved  by  a  small  synapse  circuit  cell.  The  VLSI 
chip  set  design  uses  3p  CMOS  technology. 
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